Longest Common Subsequence

Longest Common Subsequence

·

5 min read

Longest Common Subsequence is a problem in leetcode, but unlike some other normal problems in leetcode, this one is a real-world problem. It is used in Git diff command to check difference between 2 files.

Let's understand what is longest common subsequence first. A subsequence of a string is a new string generated from the original string. It is not required to be consecutive, so a, ab, ac, ad are all subsequence of abcd, as long as the order is the same as the original string. So given 2 strings, the step is find all subsequence for each string, and return longest one.

Recursive solution

Of course we could follow the description of the problem to use the brute-force approach to find all subsequences. But this is time-consuming. The key to solve this problem efficiently is to cut current problems into one or many smaller problems, then find the relations between the two.

So, if text1's length is m, and text2's length is n, consider the following 3 smaller problems:

current problem: lcs(text1, text2)

smaller problems:
- lcs(text1.slice(0, m-1), text2)
- lcs(text1, text2.slice(0, n-1))
- lcs(text1.slice(0, m-1), text2.slice(0, n-1))

What is the relation between current problem with the smaller ones? Well if text1[m-1]) === text2[n-1]), which means the last characters of each string are equal, then because we got 1 character matched, so we can get a relation: lcs(text1, text2) = 1 + lcs(text1.slice(0, m-1), text2.slice(0, n-1)). If the last character are not equal, then we have 2 options. Either delete the last character of text1, or delete the last character of text2, then try to check again. The question is to find the longest common subsequence, so what we want to do is calculate both options and choose the longest one. So we have another pattern: lcs(text1, text2) = max(lcs(text1.slice(0, m-1), text2), lcs(text1, text2.slice(0, n-1)).

With these 2 pattern, we could make current problem into smaller problems. Then do it recursively, until when either string is empty. Let's do it in code.

function longestCommonSubsequence(text1: string, text2: string): number {
    function lcs(i: number, j: number) {
        if (i === text1.length || j === text2.length) return 0;

        if (text1[i] === text2[j]) {
            return 1 + lcs(i + 1, j + 1);
        } else {
            return Math.max(lcs(i + 1, j), lcs(i, j + 1))
        }
    }
    return lcs(0, 0);
};

The code is clear and simple, but if we run it in leetcode, then we should get a Time Limit Exceeded failure. If we print the parameters of function lcs when it being called, then we should see that this function is called many time with the same parameters. So the problem is, with this recursive approach, we do a lot repetitive calculations.

Recursive with cache

The most intuitive way to solve this repetition is to use a cache. Whenever we find a same calculation task, then we can return result directly.

function longestCommonSubsequence(text1: string, text2: string): number {
    const m = Array.from({ length: text1.length + 1 }, _ => {
        return Array.from({ length: text2.length + 1 }, _ => -1);
    });

    function lcs(i: number, j: number): number {
        if (m[i][j] !== -1) return m[i][j];

        if (i === text1.length || j === text2.length) {
            m[i][j] = 0;
            return m[i][j];
        }

        if (text1[i] === text2[j]) {
            m[i][j] = 1 + lcs(i + 1, j + 1);
        } else {
            m[i][j] = Math.max(lcs(i + 1, j), lcs(i, j + 1));
        }

        return m[i][j];
    }

    return lcs(0, 0);
};

Try this solution in leetcode again, this should work.

Dynamic programming

Even if we have a cache, store results into cache and check cache many time still causes overhead.

Actually, based on the patterns we already have, we can folllow a bottom-up approach to make this code run faster further.

The idea is to do a reverse thinking. Instead of breaking big problem into smaller problems, we start with smaller problem first and build up to find answers for big problems.

So, start with empty string, we know if either string is empty, then the result is 0. Then based on the same pattern, we make the string longer and longer and store all temperary results in a 2-dimensional array. After we reach the length of 2 strings, then we get the final answer.

function longestCommonSubsequence(text1: string, text2: string): number {
    const m = Array.from({ length: text1.length + 1 }, _ => {
        return Array.from({ length: text2.length + 1 }, _ => -1);
    });

    for (let i = 0; i <= text1.length; i++) {
        m[i][0] = 0;
    }
    for (let j = 0; j <= text2.length; j++) {
        m[0][j] = 0;
    }

    // use the same pattern, but build up
    for (let i = 1; i <= text1.length; i++) {
        for (let j = 1; j <= text2.length; j++) {
            if (text1[i - 1] === text2[j - 1]) {
                m[i][j] = 1 + m[i - 1][j - 1];
            } else {
                m[i][j] = Math.max(m[i - 1][j], m[i][j - 1]);
            }
        }
    }

    return m[text1.length][text2.length];
};

Find LCS

Yes, we already solve the problem. But what we have now is just find the length of the longest common subsequence. So how can we find the string? Well, we already have the 2-dimensional array which stores all temporary results, we could start from the end, and follow the same pattern reversely, and get the longest common subsequence.

function longestCommonSubsequence(text1: string, text2: string): string {
    const m = Array.from({ length: text1.length + 1 }, _ => {
        return Array.from({ length: text2.length + 1 }, _ => -1);
    });

    for (let i = 0; i <= text1.length; i++) {
        m[i][0] = 0;
    }
    for (let j = 0; j <= text2.length; j++) {
        m[0][j] = 0;
    }

    for (let i = 1; i <= text1.length; i++) {
        for (let j = 1; j <= text2.length; j++) {
            if (text1[i - 1] === text2[j - 1]) {
                m[i][j] = 1 + m[i - 1][j - 1];
            } else {
                m[i][j] = Math.max(m[i - 1][j], m[i][j - 1]);
            }
        }
    }

    let x = text1.length;
    let y = text2.length;

    let lcsv: string[] = [];
    while (true) {
        if (x <= 0 || y <= 0 || m[x][y] === 0) break;

        if (m[x - 1][y] === m[x][y]) {
            x -= 1;

        } else if (m[x][y - 1] === m[x][y]) {
            y -= 1;

        } else {
            // push the character which affect the lcs's length
            lcsv.push(text1[x - 1]);

            x -= 1;
            y -= 1;
        }
    }

    lcsv.reverse();

    return lcsv.join("");
};