Edit Distance

Edit Distance

·

4 min read

Given two strings word1 and word2, return the minimum number of operations required to convert word1 to word2.

You have the following three operations permitted on a word:

  • Insert a character
  • Delete a character
  • Replace a character

We could use dynamic programming to solve this problem. The basic idea of dynamic programming is to breaking a big problem into many smaller ones. Let's see how to break this one.

Say we have a string word1 with length m and another string word2 with length n, and we express the minimum edit distance as d(m, n).

First, let's consider the most simple scenarios. If m is 0, what is the value of d(0, n)? This is simple, we can just insert characters in word1 m times, or delete word2 n times. So the answer is n. It can't get any smaller. The same is when n is 0, so d(m, 0) is n. So if m is 0 and n is 0, obviously d(0, 0) is 0 too. Let‘s orgainze what we have now.

d(n, 0): n
d(0, m): m
d(0, 0): 0

Then let's consider another thing. Say we want to make the problem a little smaller, we delete the last character of word1. Now what is the value of d(n-1, m)? What is the difference between d(n, m) and d(n-1, m)? Since it's just a delete operation, the difference is 1. This should be the same when delete the last character of word2, So we could also have d(n, m) = d(n, m-1) + 1 too. Then comes the core part. What is the difference between d(n, m) and d(n-1, m-1)? This basicly says delete the last character of word1 and word2. The answer should not be hard, because if the last character is the same, then d(n-1, m-1) should be equal to d(n, m), if not, since it's only a substitution operation, the difference must be 1. Let‘s orgainze what we have now.

d(n, m) = d(n, m-1) + 1
d(n, m) = d(n-1, m) + 1

d(n, m) = d(n-1, m-1) (if last character is the same)
d(n, m) = d(n-1, m-1) + 1 (if last character is not the same)

So down the road, there are only three ways to get d(n, m), we can get all of them and take the minimum one.

d(n, m) = min(
  d(n, m-1) + 1,
  d(n-1, m) + 1,
  d(n-1, m-1) (if last character is the same)
  d(n-1, m-1) + 1 (if last character is not the same)
)

Since we have the formula above, we follow it.

function minDistance(word1: string, word2: string): number {
    const m = word1.length;
    const n = word2.length;

    if (m === 0) {
        return n;
    }
    if (n === 0) {
        return m;
    }
    if (word1[m - 1] === word2[n - 1]) {
        return minDistance(word1.slice(0, m - 1), word2.slice(0, n - 1));
    } else {
        return 1 + Math.min(
            minDistance(word1.slice(0, m - 1), word2),
            minDistance(word1, word2.slice(0, n - 1)),
            minDistance(word1.slice(0, m - 1), word2.slice(0, n - 1)),
        );
    }
};

You can see this code uses recursive method. If the word is long, this may take long time, and stack overflow may occur.

We can think about this formula in reverse order. We compute the samller ones first then the big ones ietratively. This is called iterative method.

function minDistance(word1: string, word2: string): number {
    const m = word1.length;
    const n = word2.length;
    if (m === 0) return n;
    if (n === 0) return m;

    // init a two dimensional array
    // dp[m][n] represents distance between word1 and word2
    // dp[1][1] represents distance between word1.slice(0,1), word2(0,1)
    const dp = Array.from({ length: m + 1 }, _ => Array.from({ length: n + 1 }, _ => 0));

    // if n === 0, then dp[m][n] = m
    for (let i = 0; i <= m; i++) {
        dp[i][0] = i;
    }

    // if m === 0, dp[0][n] = n
    for (let j = 0; j <= n; j++) {
        dp[0][j] = j;
    }

    for (let i = 1; i <= m; i++) {
        for (let j = 1; j <= n; j++) {

            const u = dp[i - 1][j] + 1; // up
            const l = dp[i][j - 1] + 1; // left
            const ul = dp[i - 1][j - 1]; // upleft

            if (word1[i - 1] === word2[j - 1]) {
                dp[i][j] = Math.min(u, l, ul);
            } else {
                dp[i][j] = Math.min(u, l, ul + 1);
            }
        }.
    }

    return dp[m][n];
};

At last, don't forget to use leetcode Edit Distance to test if your code is working