How to find Longest Common Substring using C++

2019-01-17 07:42发布

I searched online for a C++ Longest Common Substring implementation but failed to find a decent one. I need a LCS algorithm that returns the substring itself, so it's not just LCS.

I was wondering, though, about how I can do this between multiple strings.

My idea was to check the longest one between 2 strings, and then go check all the others, but this is a very slow process which requires managing many long strings on the memory, making my program quite slow.

Any idea of how this can be speeded up for multiple strings? Thank you.

Important Edit One of the variables I'm given determines the number of strings the longest common substring needs to be in, so I can be given 10 strings, and find the LCS of them all (K=10), or LCS of 4 of them, but I'm not told which 4, I have to find the best 4.

2楼-- · 2019-01-17 08:13

Here is an excellent article on finding all common substrings efficiently, with examples in C. This may be overkill if you need just the longest, but it may be easier to understand than the general articles about suffix trees.

3楼-- · 2019-01-17 08:13

Here is a C# version to find the Longest Common Substring using dynamic programming of two arrays (you may refer to: for more details)

class LCSubstring
            public int Length = 0;
            public List<Tuple<int, int>> indices = new List<Tuple<int, int>>();
        public string[] LongestCommonSubStrings(string A, string B)
            int[][] DP_LCSuffix_Cache = new int[A.Length+1][];
            for (int i = 0; i <= A.Length; i++)
                DP_LCSuffix_Cache[i] = new int[B.Length + 1];
            LCSubstring lcsSubstring = new LCSubstring();
            for (int i = 1; i <= A.Length; i++)
                for (int j = 1; j <= B.Length; j++)
                    //LCSuffix(Xi, Yj) = 0 if X[i] != X[j]
                    //                 = LCSuffix(Xi-1, Yj-1) + 1 if Xi = Yj
                    if (A[i - 1] == B[j - 1])
                        int lcSuffix = 1 + DP_LCSuffix_Cache[i - 1][j - 1];
                        DP_LCSuffix_Cache[i][j] = lcSuffix;
                        if (lcSuffix > lcsSubstring.Length)
                            lcsSubstring.Length = lcSuffix;
                            var t = new Tuple<int, int>(i, j);
                        else if(lcSuffix == lcsSubstring.Length)
                            //may be more than one longest common substring
                            lcsSubstring.indices.Add(new Tuple<int, int>(i, j));
                        DP_LCSuffix_Cache[i][j] = 0;
            if(lcsSubstring.Length > 0)
                List<string> substrings = new List<string>();
                foreach(Tuple<int, int> indices in lcsSubstring.indices)
                    string s = string.Empty;
                    int i = indices.Item1 - lcsSubstring.Length;
                    int j = indices.Item2 - lcsSubstring.Length;
                    Assert.IsTrue(DP_LCSuffix_Cache[i][j] == 0);
                    for(int l =0; l<lcsSubstring.Length;l++)
                        s += A[i];
                        Assert.IsTrue(A[i] == B[j]);
                    Assert.IsTrue(i == indices.Item1);
                    Assert.IsTrue(j == indices.Item2);
                    Assert.IsTrue(DP_LCSuffix_Cache[i][j] == lcsSubstring.Length);
                return substrings.ToArray();
            return new string[0];

Where unit tests are:

        public void LCSubstringTests()
            string A = "ABABC", B = "BABCA";
            string[] substrings = this.LongestCommonSubStrings(A, B);
            Assert.IsTrue(substrings.Length == 1);
            Assert.IsTrue(substrings[0] == "BABC");
            A = "ABCXYZ"; B = "XYZABC";
            substrings = this.LongestCommonSubStrings(A, B);
            Assert.IsTrue(substrings.Length == 2);
            Assert.IsTrue(substrings.Any(s => s == "ABC"));
            Assert.IsTrue(substrings.Any(s => s == "XYZ"));
            A = "ABC"; B = "UVWXYZ";
            string substring = "";
            for(int i =1;i<=10;i++)
                A += i;
                B += i;
                substring += i;
                substrings = this.LongestCommonSubStrings(A, B);
                Assert.IsTrue(substrings.Length == 1);
                Assert.IsTrue(substrings[0] == substring);
ゆ 、 Hurt°
4楼-- · 2019-01-17 08:14

There is a very elegant Dynamic Programming solution to this.

Let LCSuff[i][j] be the longest common suffix between X[1..m] and Y[1..n]. We have two cases here:

  • X[i] == Y[j], that means we can extend the longest common suffix between X[i-1] and Y[j-1]. Thus LCSuff[i][j] = LCSuff[i-1][j-1] + 1 in this case.

  • X[i] != Y[j], since the last characters themselves are different, X[1..i] and Y[1..j] can't have a common suffix. Hence, LCSuff[i][j] = 0 in this case.

We now need to check maximal of these longest common suffixes.

So, LCSubstr(X,Y) = max(LCSuff(i,j)), where 1<=i<=m and 1<=j<=n

The algorithm pretty much writes itself now.

string LCSubstr(string x, string y){
    int m = x.length(), n=y.length();

    int LCSuff[m][n];

    for(int j=0; j<=n; j++)
        LCSuff[0][j] = 0;
    for(int i=0; i<=m; i++)
        LCSuff[i][0] = 0;

    for(int i=1; i<=m; i++){
        for(int j=1; j<=n; j++){
            if(x[i-1] == y[j-1])
                LCSuff[i][j] = LCSuff[i-1][j-1] + 1;
                LCSuff[i][j] = 0;

    string longest = "";
    for(int i=1; i<=m; i++){
        for(int j=1; j<=n; j++){
            if(LCSuff[i][j] > longest.length())
                longest = x.substr((i-LCSuff[i][j]+1) -1, LCSuff[i][j]);
    return longest;
5楼-- · 2019-01-17 08:17


You can build a generalised suffix tree with multiple string.

Look at this

The Suffix tree can be built in O(n) time for each string, k*O(n) in total. K is total number of strings.

So it's very quick to solve this problem.

6楼-- · 2019-01-17 08:21

I tried several different solutions for this but they all seemed really slow so I came up with the below, didn't really test much, but it seems to work a bit faster for me.

#include <iostream>

std::string lcs( std::string a, std::string b )
    if( a.empty() || b.empty() ) return {} ;

    std::string current_lcs = "";

    for(int i=0; i< a.length(); i++) {
        size_t fpos = b.find(a[i], 0);
        while(fpos != std::string::npos) {
            std::string tmp_lcs = "";
            tmp_lcs += a[i];
            for (int x = fpos+1; x < b.length(); x++) {
                size_t spos = a.find(tmp_lcs, 0);
                if (spos == std::string::npos) {
                } else {
                    if (tmp_lcs.length() > current_lcs.length()) {
                        current_lcs = tmp_lcs;
            fpos = b.find(a[i], fpos+1);
    return current_lcs;

int main(int argc, char** argv)
    std::cout << lcs(std::string(argv[1]), std::string(argv[2])) << std::endl;
7楼-- · 2019-01-17 08:22

Find the largest substring from all strings under consideration. From N strings, you'll have N substrings. Choose the largest of those N.

登录 后发表回答