算法学习--字符串--最长公共子序列（LCS）

时间 2020-01-18

标签算法学习字符串最长公共序列 lcs 繁體版

原文原文链接

定义html

一个序列S任意删除若干个字符获得新序列T，则T称为S的子序列算法

两个序列X和Y的公共子序列中，长度最长的那个，定义为X和Y的最长公共子序列（LCS）数组

如12455和3455677的最长公共子序列就是455app

注意区别最大公共子串（须要连续）ide

算法
优化

（1）暴力穷举法（不可取）ui

（2）动态规划spa

假设两个序列X和Y，设Xi为X序列的前i个字符，Yi为Y序列的前i个字符htm

LCS（X, Y）为X和Y的最长公共子序列递归

若Xm=Yn(最后一个字符相同)，则LCS（X, Y）的最后一个字符Zk=Xm=Yn

即LCS(Xm, Yn) = LCS(Xm-1, Yn-1) + Xm

若Xm!=Yn(最后一个字符不相同)，则LCS(Xm, Yn) =LCS(Xm-1, Yn)或LCS(Xm, Yn) =LCS(Xm, Yn-1)

即LCS(Xm, Yn) =max{LCS(Xm-1, Yn), LCS(Xm, Yn-1)}

相关算法题

Delete Operation for Two Strings

Given two words word1 and word2, find the minimum number of steps required to make word1 and word2 the same, where in each step you can delete one character in either string.

Example 1:

Input: "sea", "eat"Output: 2Explanation: You need one step to make "sea" to "ea" and another step to make "eat" to "ea".

Note:

The length of given words won't exceed 500.
Characters in given words can only be lower-case letters.

该题实则能够转换成LCS的问题，假设s1,s2两个序列长度是m,n，则本题的结果能够经过m+n-2*LCS(s1,s2)得到

接下来就来解决如何求LCS（s1,s2）,也就是上文提到的LCS算法的实现

Approach #1 Using Longest Common Subsequence [Time Limit Exceeded]（用递归实现）

public class Solution {
    public int minDistance(String s1, String s2) {
        return s1.length() + s2.length() - 2 * lcs(s1, s2, s1.length(), s2.length());
    }
    public int lcs(String s1, String s2, int m, int n) {
        if (m == 0 || n == 0)
            return 0;
        if (s1.charAt(m - 1) == s2.charAt(n - 1))
            return 1 + lcs(s1, s2, m - 1, n - 1);
        else
            return Math.max(lcs(s1, s2, m, n - 1), lcs(s1, s2, m - 1, n));
    }
}

这种方法的时间复杂度为O(2max(m,n))，空间复杂度 $O (max (m, n))$ ，实际上有些结果有重复计算，能够改进该算法

Approach #2 Longest Common Subsequence with Memoization [Accepted]（空间换时间，依旧使用递归）

public class Solution {
    public int minDistance(String s1, String s2) {
        int[][] memo = new int[s1.length() + 1][s2.length() + 1];
        return s1.length() + s2.length() - 2 * lcs(s1, s2, s1.length(), s2.length(), memo);
    }
    public int lcs(String s1, String s2, int m, int n, int[][] memo) {
        if (m == 0 || n == 0)
            return 0;
        if (memo[m][n] > 0)
            return memo[m][n];
        if (s1.charAt(m - 1) == s2.charAt(n - 1))
            memo[m][n] = 1 + lcs(s1, s2, m - 1, n - 1, memo);
        else
            memo[m][n] = Math.max(lcs(s1, s2, m, n - 1, memo), lcs(s1, s2, m - 1, n, memo));
        return memo[m][n];
    }
}

这种方法的时间复杂度为O(m*n)，空间复杂度 $O (m*n)$ ，实际上就是用一个二维数组记录已经计算过的值避免重复计算，可是仍是使用到了递归的思想

Approach #3 Using Longest Common Subsequence- Dynamic Programming [Accepted]（空间换时间，扫描LCS(Xi, Yi)每个点经过附近的点算出来）

public class Solution {
    public int minDistance(String s1, String s2) {
        int[][] dp = new int[s1.length() + 1][s2.length() + 1];
        for (int i = 0; i <= s1.length(); i++) {
            for (int j = 0; j <= s2.length(); j++) {
                if (i == 0 || j == 0)
                    continue;
                if (s1.charAt(i - 1) == s2.charAt(j - 1))
                    dp[i][j] = 1 + dp[i - 1][j - 1];
                else
                    dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1]);
            }
        }
        return s1.length() + s2.length() - 2 * dp[s1.length()][s2.length()];
    }
}

这种方法的时间复杂度为O(m*n)，空间复杂度 $O (m*n)$ ，与Approach#2的区别是二维数组不是经过递归计算得出，而是直接扫描每行每列把全部值都计算出来。

固然为了进一步节省空间，该算法能够进一步优化空间复杂度，发现算法中求每行的值只与本行和上一行的值有关，因此能够仅用两行一维数组来搞定该算法。这样就下降了空间复杂度