Shortest Longest Common Subsequence Code

11

Your task to to solve the SLCSC problem, which consists in finding the shortest possible code to solve the Longest Common Subsequence problem.

A valid solution to the LCS problem for two or more strings S1, … Sn is any string T of maximal length such that the characters of T appear in all Si, in the same order as in T.

Note that T does not have to be a substring of Si.

Example

The strings axbycz and xaybzc have 8 common subsequences of length 3:

abc abz ayc ayz xbc xbz xyc xyz

Any of these would be a valid solution for the LCS problem.

Details

Write a program or a function that solves the LCS problem, as explained above, abiding the following rules:

  • The input will consist of two or more strings containing only lowercase letters.

    You may read those strings as an array of strings or a single string with a delimiter of your choice.

  • Your code must output any single one of the possible solutions to the problem, optionally followed by a linefeed or surrounded by quotes.

  • You may assume that the strings are shorter than 1000 characters and that there are at most 20 strings.

    Within these limits, your code should work as expected in theory (given unlimited time and memory).

  • Your code must complete the combined test cases of the next section in under an hour on my machine (Intel Core i7-3770, 16 GiB RAM).

    Approaches that simply iterate over all possible subsequences will not comply with the time limit.

  • Using built-ins that trivialize this task, such as LongestCommonSequence, is not allowed.

Standard rules apply.

Test cases

a
ab

Output: a


aaaaxbbbb
bbbbxcccc
ccccxaaaa

Output: x


hxbecqibzpyqhosszypghkdddykjfjcajnwfmtfhqcpavqrtoipocijrmqpgzoufkjkyurczxuhkcpehbhpsuieqgjcepuhbpronmlrcgvipvibhuyqndbjbrrzpqbdegmqgjliclcgahxoouqxqpujsyfwbdthteidvigudsuoznykkzskspjufgkhaxorbrdvgodlb
qnnprxqpnafnhekcxljyysobbpyhynvolgtrntqtjpxpchqwgtkpxxvmwwcohxplsailheuzhkbtayvmxnttycdkbdvryjkfhshulptkuarqwuidrnjsydftsyhuueebnrjvkfvhqmyrclehcwethsqzcyfvyohzskvgttggndmdvdgollryqoswviqurrqhiqrqtyrl

Output: hxbbpyhogntqppcqgkxchpsieuhbncvpuqndbjqmclchqyfttdvgoysuhrrl or any other common subsequence of the same length


riikscwpvsbxrvkpymvbbpmwclizxlhihiubycxmxwviuajdzoonjpkgiuiulbjdpkztsqznhbjhymwzkutmkkkhirryabricvhb
jnrzutfnbqhbaueicsvltalvqoorafnadevojjcsnlftoskhntashydksoheydbeuwlswdzivxnvcrxbgxmquvdnfairsppodznm
kzimnldhqklxyezcuyjaiasaeslicegmnwfavllanoolkhvqkjdvxrsjapqqwnrplcwqginwinktxnkfcuuvoyntrqwwucarfvjg

Output: icsvllvjnlktywuar or any other common subsequence of the same length


rblartqkfggrjpiipuzzypkycnyckkcqceeujiyy
yfpnedyjtiwrhyuktripdwyvgkblzodeufkelhub
ywcyboxwqkibwvredkrbdsyudkulpvmhixeqxynh
bnxwahbzhwjxkotnvbxrojkkldtsodtjmvdrmbgr

Output: krkk or any other common subsequence of the same length


bwfscmotshoczmduwaev
coywaaizdaxjovipsmeh
dobzcpoiedenzlfdjpiu
bbhfeqscvvbwkuoxdoge
drqrzwbcpcsvneodelja

Output: code or any other common subsequence of the same length


nqrualgoedlf
jgqorzglfnpa
fgttvnogldfx
pgostsulyfug
sgnhoyjlnfvr
wdttgkolfkbt

Output: golf or any other common subsequence of the same length


epopyfuhgowedpiqpgfj
ddxyestqynpwmytxhozm
ptubgzyqqksieoovuezv
tovotqmnhgzttfpywjgr
aomvytkgaijlgqzkljls
vzvxpaixrnprxvenbbuo
syfuwxlpcyontqlmfvib
arxleuomstkjegpduwcx
xgqrxaopouvkvwgbzewn
yggunddigctgpnuztzai
izroomywwztdymqszsuo
kiawjnxjncdtufhkrjsp

Output: the empty string

Dennis

Posted 2015-07-08T18:42:32.580

Reputation: 196 637

@NotthatCharles Not all all. That question gives only two strings as input and has no time limit. All existing answers use naïve approaches that are magnitudes too slow to comply with this question's rules. – Dennis – 2015-07-08T22:57:11.977

The last example probably takes the longest time to compute, however by first removing every character which doesn't appear in every string it is trivial to output the empty string. Could you add another example with the same number of strings and length of strings, where every character used appears in every string and where the LCS is at least 5 characters or so? Something like: https://ghostbin.com/paste/x9caq

– Tyilo – 2015-07-09T00:43:48.377

@Tylio Incorporating some logic that ends recursion early if the strings have no more common characters is pretty much what the last test case is about. – Dennis – 2015-07-09T00:53:58.597

@Dennis So the solution shouldn't be able to run in reasonable time with 20 arbitrary length 1000 strings? – Tyilo – 2015-07-09T03:08:48.123

@Tyilo: Unless those strings don't share characters, I don't think it's even possible. I'll post a clarifying edit. – Dennis – 2015-07-09T03:10:49.117

@Dennis, ok my solution now exploits this when possible and it can now handle the last case very fast. – Tyilo – 2015-07-09T03:19:25.237

This is slightly tricky: I keep making Pyth segfault! – kirbyfan64sos – 2015-07-09T15:06:10.827

Answers

4

CJam, 31

q~L{_:&\f{_2$f#:).>j+}{,}$W>s}j

Try it online

9 bytes golfed thanks to Dennis!

Explanation:

This algorithm tries every possible character for the first position of the subsequence, replaces each string with the substring after the first occurrence of that character, and then calls itself recursively (with memoization).

q~          read and evaluate the input (taken as an array)
L{…}j       execute block with recursive memoization and no starting values
  _         duplicate the array of strings
  :&\       intersect the strings as character sets and move before the array
             these are all the possible characters for the sequence
  f{…}      for each character and the array
    _2$     duplicate the array and the character
    f#      find the character position in each string
    :)      increment the positions (to skip the character)
    .>      slice each string starting at the corresponding position
    j       call the j block recursively
    +       concatenate the starting character with the result
  {,}$      sort resulting strings (one for each character) by length
  W>        keep only the last element, if any
  s         convert (from 0/1-string array) to string

aditsu quit because SE is EVIL

Posted 2015-07-08T18:42:32.580

Reputation: 22 326

5

Python - 665 644

Indentation levels:

1: space
2: tab
3: tab + space
4: 2 tabs
5: 2 tabs + space

The code defines a function o, which takes a list of strings as arguments and returns one of the LCS's for the strings.

def o(t):
 t=[[y for y in x if y in reduce(lambda x,y:x.intersection(y),t,set(t[0]))]for x in t];l=map(len,t);C=[0]*reduce(lambda x,y:x*-~y,l,1);y=lambda z:[x-1for x in z];m=len(t);e=enumerate
 def g(h):
    r,x=0,1
    for k,j in e(h):r+=-~j*x;x*=-~l[k]
    return r
 def f(h):
    i=len(h)
    if i==m:
     b=g(h);c=t[0][h[0]]
     for k,j in e(h):
         if t[k][j]!=c:break
     else:C[b]=1+C[g(y(h))];return
     r=0
     for k,_ in e(h):a=h[:];a[k]-=1;r=max(r,C[g(a)])
     C[b]=r;return
    for j,_ in e(t[i]):f(h+[j])
 def p(h):
    if min(h)==-1:return''
    v=C[g(h)]
    for k,_ in e(h):
        a=h[:];a[k]-=1
        if v==C[g(a)]:return p(a)
    return p(y(h))+t[0][h[0]]
 f([]);return p(y(l))

Test code:

tests = [
"""
a
ab
""",
"""
aaaaxbbbb
bbbbxcccc
ccccxaaaa
""",
"""
hxbecqibzpyqhosszypghkdddykjfjcajnwfmtfhqcpavqrtoipocijrmqpgzoufkjkyurczxuhkcpehbhpsuieqgjcepuhbpronmlrcgvipvibhuyqndbjbrrzpqbdegmqgjliclcgahxoouqxqpujsyfwbdthteidvigudsuoznykkzskspjufgkhaxorbrdvgodlb
qnnprxqpnafnhekcxljyysobbpyhynvolgtrntqtjpxpchqwgtkpxxvmwwcohxplsailheuzhkbtayvmxnttycdkbdvryjkfhshulptkuarqwuidrnjsydftsyhuueebnrjvkfvhqmyrclehcwethsqzcyfvyohzskvgttggndmdvdgollryqoswviqurrqhiqrqtyrl
""",
"""
riikscwpvsbxrvkpymvbbpmwclizxlhihiubycxmxwviuajdzoonjpkgiuiulbjdpkztsqznhbjhymwzkutmkkkhirryabricvhb
jnrzutfnbqhbaueicsvltalvqoorafnadevojjcsnlftoskhntashydksoheydbeuwlswdzivxnvcrxbgxmquvdnfairsppodznm
kzimnldhqklxyezcuyjaiasaeslicegmnwfavllanoolkhvqkjdvxrsjapqqwnrplcwqginwinktxnkfcuuvoyntrqwwucarfvjg
""",
"""
rblartqkfggrjpiipuzzypkycnyckkcqceeujiyy
yfpnedyjtiwrhyuktripdwyvgkblzodeufkelhub
ywcyboxwqkibwvredkrbdsyudkulpvmhixeqxynh
bnxwahbzhwjxkotnvbxrojkkldtsodtjmvdrmbgr
""",
"""
bwfscmotshoczmduwaev
coywaaizdaxjovipsmeh
dobzcpoiedenzlfdjpiu
bbhfeqscvvbwkuoxdoge
drqrzwbcpcsvneodelja
""",
"""
nqrualgoedlf
jgqorzglfnpa
fgttvnogldfx
pgostsulyfug
sgnhoyjlnfvr
wdttgkolfkbt
""",
"""
epopyfuhgowedpiqpgfj
ddxyestqynpwmytxhozm
ptubgzyqqksieoovuezv
tovotqmnhgzttfpywjgr
aomvytkgaijlgqzkljls
vzvxpaixrnprxvenbbuo
syfuwxlpcyontqlmfvib
arxleuomstkjegpduwcx
xgqrxaopouvkvwgbzewn
yggunddigctgpnuztzai
izroomywwztdymqszsuo
kiawjnxjncdtufhkrjsp
"""
]

for s in tests:
 print o(s.strip().split())

Time it takes to run the tests on my computer:

$ time python 52808-shortest-longest-common-subsequence-code-golfed.py
a
x
hecbpyhogntqtpcqgkxchpsieuhbncvhuqndbjqmclchqyfhtdvgoysuhrrl
icsvllvanlktywuar
krkk
code
golf

        9.03 real         8.99 user         0.03 sys

Tyilo

Posted 2015-07-08T18:42:32.580

Reputation: 1 372

1You should add a byte to get your code to 666 bytes. So metal. \m/ – Alex A. – 2015-07-09T16:29:43.763

@AlexA. Yeah I also noticed that when counting the bytes as it included a newline on the last line. – Tyilo – 2015-07-09T16:31:36.697

There are a few small improvements I see right away that should help. Firstly, anywhere you have a (n+1), you can replace it with -~n to save 2 bytes in each case. Also, anywhere you use map with a lambda, consider using list comprehension instead. For example, map(lambda x:x-1,z) can be shortened by three bytes by changing it to [~-x for x in z]. – Kade – 2015-07-09T20:38:40.967

r,x=r+(j+1)*x,x*(l[k]+1) can be shortened to r+=(j+1)*x;x*=(l[k]+1). Also, you don't need u=... because u is only used in one place. Just substitute that code for the letter u. – mbomb007 – 2015-07-10T19:25:41.047

@Vioz- and mbomb007 Thanks. – Tyilo – 2015-07-11T10:34:10.917

4

Pyth, 59 58 55 35 bytes

L&@Fb?+yPMbeeb@FeMbeolNmyXJbdP@bdlb

Cut a whopping 20 bytes thanks to @isaacg!

55 byte version:

DCHR?k!&.AH@FH?+Cm<1dHeeHql{medH1h.MlZ.eC++<Hk]<1b>HhkH

Cut off 3 bytes by changing .U@bZ to @F (the fold operator).

58 byte version:

DCHR?k!&.AH.U@bZH?+Cm<1dHeeHql{medH1h.MlZ.eC++<Hk]<1b>HhkH

Cut off a byte by changing the format of a boolean conditional.

59 byte version:

DCHR?k|!.AH!.U@bZH?+Cm<1dHeeHql{medH1h.MlZ.eC++<Hk]<1b>HhkH

This was hard! Python kept segfaulting! Pretty sure it was some kind of bug, but I couldn't get a minimal test case. Oh well.

I based the algorithm off of this one. Which is fine, except that that one is designed for only two strings. I had to tweak it a bit to make it work on more. Then, the last test case was taking too long, so I had to add an additional check to exit the recursion if there are no more common characters.

It's pretty slow, but should take less than an hour (hopefully). I'm testing on my Core i3 with 6 GB of RAM, so your 16-GB Core i7 should blow through this. :)

I also took advantage of Pyth's auto-memoizing functions to make it a bit faster.

EDIT: @Dennis said it passes!

To test it, add the following line:

CQ

and give it a list of strings via standard input (e.g. ['a', 'ab']).

Explanation for the 35 byte version:

WIP.

Explanation for the 55 byte version:

DCH                                                        define a function C that takes a list of strings H
   R                                                       return the following expression
    ?                                                      if
      !&.AH@FH                                             there are no more common letters OR all the strings are empty
     k                                                     return the empty string
              ?          ql{medH1                          else if the last character of every string is equal
               +Cm<1dHeeH                                  return the result of adding the last character to recursion with every item without its last character
                                 h.MlZ.eC++<Hk]<1b>HhkH    otherwise, return the largest result of recursing len(H) times, each time with one element's last character cut off

kirbyfan64sos

Posted 2015-07-08T18:42:32.580

Reputation: 8 730

@Dennis Ok; I'll work on it. – kirbyfan64sos – 2015-07-09T19:23:47.643

@Dennis Updated. You can try again now. – kirbyfan64sos – 2015-07-09T19:33:58.870

The last test case finishes instantly now. – Dennis – 2015-07-09T19:38:18.257

@Dennis YESSSSS!! – kirbyfan64sos – 2015-07-09T19:42:16.167

@kirbyfan64sos About the segfaults: Pyth segfaults when the recursion depth goes too high, such as on infinite recursion. – isaacg – 2015-07-09T23:47:31.557

@isaacg Ahh...that makes sense. It seems to raise the recursion limit to a really high number. – kirbyfan64sos – 2015-07-09T23:57:34.353

@kirbyfan64sos I golfed your solution down by 20 bytes, to 35 bytes: L&@Fb?+yPMbeeb@FeMbeolNmyXJbdP@bdlb Would you like to incorporate it into your answer, or should I post it separately? – isaacg – 2015-07-10T00:56:24.927

@isaacg I can edit it in, but I'd kind of like to know the changes first... Thanks! :) – kirbyfan64sos – 2015-07-10T00:57:31.877

Sure. Here are the changes: I removed .AH, which was redundant with @FH. ?k!@FH -> &@FH. <1 -> P. mPd -> PM. med -> eM. ql{eMH1 -> @FeMb. h.MlZ -> eolN. .eC++<Hk]Pb>HhkK -> myXJbdP@bdlb (changed from manual list construction to X, assign-at). DCHR -> L, and changed C to y and H to b. The function you call is now y. – isaacg – 2015-07-10T01:10:05.140

4

C, 618 564 bytes

d,M,N,A[9999][2];char*(R[9999][20]),b[1000];L(char**s,n){char*j[20],c,a=0;int x[n],y=n-1,z,i,t,m=0,w=1;for(;y;)x[y--]=999;for(;y<N;y++){for(i=0;i<n&&s[i]==R[y][i];i++);if(i/n){a=A[y][0];m=A[y][1];w=0;if(m+d<M||!a)goto J;else{c=a;goto K;}}}for(c=97;w&&c<'{';c++){K:t=1,y=1,z=1;for(i=0;i<n;j[i++]++){for(j[i]=s[i];*j[i]-c;j[i]++)t&=!!*j[i];y&=j[i]-s[i]>x[i]?z=0,1:0;}t&=!y;I:if(t){if(z)for(i=0;i<n;i++)x[i]=j[i]-s[i];d++,t+=L(j,n),d--,m=t>m?a=c,t:m;}}if(w){for(y=0;y<n;y++)R[N][y]=s[y];A[N][0]=a;A[N++][1]=m;}J:if(d+m>=M)M=d+m,b[d]=a;if(!d)N=0,M=0,puts(b);return m;}

And here it is unraveled, for "readability":

d,M,N,A[9999][2];
char*(R[9999][20]),b[1000];
L(char**s,n){
    char*j[20],c,a=0;
    int x[n],y=n-1,z,i,t,m=0,w=1;
    for(;y;)
        x[y--]=999;
    for(;y<N;y++){
        for(i=0;i<n&&s[i]==R[y][i];i++);
        if(i/n){
            a=A[y][0];
            m=A[y][1];
            w=0;
            if(m+d<M||!a)
                goto J;
            else{
                c=a;
                goto K;
            }
        }
    }
    for(c=97;w&&c<'{';c++){
        K:
        t=1,
        y=1,
        z=1;
        for(i=0;i<n;j[i++]++){
            for(j[i]=s[i];*j[i]-c;j[i]++)
                t&=!!*j[i];
            y&=j[i]-s[i]>x[i]?z=0,1:0;
        }
        t&=!y;
        I:
        if(t){
            if(z)
                for(i=0;i<n;i++)
                    x[i]=j[i]-s[i];
            d++,
            t+=L(j,n),
            d--,
            m=t>m?a=c,t:m;
        }
    }
    if(w){
        for(y=0;y<n;y++)R[N][y]=s[y];
        A[N][0]=a;
        A[N++][1]=m;
    }
    J:
    if(d+m>=M)
        M=d+m,b[d]=a;
    if(!d)
        N=0,M=0,puts(b);
    return m;
}

Ladies and gentlemen, I've made a horrible mistake. It used to be prettier... And goto-less... At least now it is fast.

We define a recursive function L that takes as input an array s of arrays of characters and the number n of strings. The function outputs the resulting string to stdout, and incidentally returns the size in characters of that string.

The Approach

Though the code is convoluted, the strategy here is not all too complex. We start with a rather naive recursive algorithm, which I will describe with pseudocode:

Function L (array of strings s, number of strings n), returns length:

Create array of strings j of size n;

For each character c in "a-z",
    For each integer i less than n,
         Set the i'th string of j to the i'th string of s, starting at the first appearance of c in s[i]. (e.g. j[i][0] == c)
         If c does not occur in the i'th string of s, continue on to the next c.
    end For

    new_length := L( j, n ) + 1; // (C) t = new_length
    if new_length > best_length
        best_character := c; // (C) a = best_character
        best_length := new_length; // (C) m = best_length
    end if
end For

// (C) d = current_depth_in_recursion_tree
if best_length + current_depth_in_recursion_tree >= best_found
     prepend best_character to output_string // (C) b = output_string
     // (C) M = best_found, which represents the longest common substring found at any given point in the execution.
     best_found = best_length + current_depth;
end if

if current_depth_in_recursion_tree == 0
    reset all variables, print output_string
end if 

return best_length

Now, this algorithm on its own is pretty atrocious (but can be fit in around ~230 bytes, I've found). This is not how one gets speedy results. This algorithm scales incredibly poorly with string length. This algorithm does, however, scale fairly well with larger numbers of strings. The last test case would be solved virtually instantly, since no strings in s have any characters c in common. There were two main tricks I've implemented above that resulted in an incredible speed increase:

  • At every call to L, check if we've been given this same input before. Since in practice information is passed around via pointers to the same set of strings, we don't actually have to compare strings, just locations, which is great. If we find that we've gotten this information before, there's no need to run through the calculations (most of the time, but getting output makes this a bit more complicated) and we can get away with just returning the length. If we do not find a match, save this set of input/output to compare to future calls. In the C code, the second for loop attempts to find matches to the input. Known input pointers are saved in R, and the corresponding length and character output values are stored in A. This plan had a drastic effect on runtime, especially with longer strings.

  • Every time we find the locations of c in s, there's a chance we know right off the bat that what we've found isn't optimal. If every location of c appears after some known location of another letter, we automatically know that this c does not lead to an optimal substring, because you can fit one more letter in it. This means that for a small cost, we can potentially remove several hundred calls to L for large strings. In the above C code, y is a flag set if we automatically know that this character leads to a suboptimal string, and z is a flag set if we find a character that has exclusively earlier appearances than any other known character. The current earliest appearances of characters are stored in x. The current implementation of this idea is a bit messy, but nearly doubled performance in many instances.

With these two ideas, what didn't finish in an hour now took about 0.015 seconds.

There are probably plenty more little tricks that can speed up performance, but at this point I started worrying about my ability to golf everything. I'm still not content with the golf, so I'll likely come back to this later!

Timings

Here's some testing code, which I invite you to try online:

#include "stdio.h"
#include "time.h"

#define SIZE_ARRAY(x) (sizeof(x) / sizeof(*x))

int main(int argc, char** argv) {
    /* Our test case */
    char* test7[] = {
        "nqrualgoedlf",
        "jgqorzglfnpa",
        "fgttvnogldfx",
        "pgostsulyfug",
        "sgnhoyjlnfvr",
        "wdttgkolfkbt"
    };

    printf("Test 7:\n\t");
    clock_t start = clock();

    /* The call to L */
    int size = L(test7, SIZE_ARRAY(test7));


    double dt = ((double)(clock() - start)) / CLOCKS_PER_SEC;
    printf("\tSize: %d\n", size);
    printf("\tElapsed time: %lf s\n", dt);

    return 0;
}

I ran the OP's test cases on a laptop equipped with a 1.7 GHz Intel Core i7 chip, with an optimization setting of -Ofast. The simulation reported a peak of 712KB required. Here's an example run of each test case, with timings:

Test 1:
    a
    Size: 1
    Elapsed time: 0.000020 s
Test 2:
    x
    Size: 1
    Elapsed time: 0.000017 s
Test 3:
    hecbpyhogntqppcqgkxchpsieuhbmcbhuqdjbrqmclchqyfhtdvdoysuhrrl
    Size: 60
    Elapsed time: 0.054547 s
Test 4:
    ihicvaoodsnktkrar
    Size: 17
    Elapsed time: 0.007459 s
Test 5:
    krkk
    Size: 4
    Elapsed time: 0.000051 s
Test 6:
    code
    Size: 4
    Elapsed time: 0.000045 s
Test 7:
    golf
    Size: 4
    Elapsed time: 0.000040 s
Test 8:

    Size: 0
    Elapsed time: 0.000029 s


Total time: 0.062293 s

In golfing, I hit performance rather significantly, and since people seemed to like the brute speed (0.013624 s to complete all test cases combined) of my previous 618-byte solution, I'll leave it here for reference:

d,M,N,A[9999][2];char*(R[9999][20]),b[1000];L(char**s,n){char*j[20],c,a=0;int x[n],y,z,i,t,m=0,w=1;for(y=0;y<n;y++)x[y]=999;for(y=0;y<N;y++){for(i=0;i<n;i++)if(s[i]!=R[y][i])break;if(i==n){a=A[y][0];m=A[y][1];w=0;if(m+d<M||!a)goto J;else{c=a;goto K;}}}for(c=97;w&&c<'{';c++){K:t=1,y=1,z=1;for(i=0;i<n;j[i++]++){for(j[i]=s[i];*j[i]-c;j[i]++)if(!*j[i]){t=0;goto I;}if(j[i]-s[i]>x[i])z=0;if(j[i]-s[i]<x[i])y=0;}if(y){t=0;}I:if(t){if(z){for(i=0;i<n;i++){x[i]=j[i]-s[i];}}d++,t+=L(j,n),d--,m=t>m?(a=c),t:m;}}if(w){for(y=0;y<n;y++)R[N][y]=s[y];A[N][0]=a;A[N++][1]=m;}J:if(d+m>=M)M=d+m,b[d]=a;if(!d)N=0,M=0,puts(b);return m;}

The algorithm itself is unchanged, but the new code relies on divisions and some trickier bitwise operations that end up slowing the whole thing down.

BrainSteel

Posted 2015-07-08T18:42:32.580

Reputation: 5 132

I was thinking about posting a fastest-code challenge about a similar topic, but it looks like I don't have to anymore. 0.01s and 712KB is just astonishing. – Dennis – 2015-07-09T21:46:55.573

This is simply amazing! – kirbyfan64sos – 2015-07-10T01:02:19.930

Looking over your explanation, what the heck is best_found? It's only mentioned twice, once when it's used in the conditional, and the other when it's reset. – kirbyfan64sos – 2015-08-06T23:29:13.770

Looking over the C source, it seems that best_found is set to best_length + current_depth. You should probably mention that in the explanation! – kirbyfan64sos – 2015-08-06T23:31:22.197

@kirbyfan64sos best_found is a global integer that describes the length of the longest common substring found at any given point in execution. I'll put that in the explanation! – BrainSteel – 2015-08-09T16:17:49.557

1

Python 2, 285

Code:

import re
def f(s,a,b):
  if b==[]:return s+f('',[],a)
  if a==[]:return s+max([f(b[0][i],[b[0][i+1:]],b[1:]) for i in range(len(b[0]))],key=len) if b[0]!='' else ''
  return max([f(s,a+[b[0][i.start()+1:]],b[1:]) for i in re.finditer(s[-1],b[0])],key=len) if ~b[0].find(s[-1]) else ''

Usage:

print f('',[],['axbycz','xaybzc'])

Explanation:

This is a recursive function. s is the character we are looking for. a contains list of strings sliced after s. b contains list of strings untouced yet. f returns the longest common string.

The first condition checks if we finished to go through all the strings. if so, it means s is common character, and it returns s and looking for more common characters.

The second condition checks if we don't start to go through any string, meaning we don't even have a character (a==[] is equivalent to s==''). if so we check each character of the first string in b.

The last line moves the first string in b to a, by finding each occurrence of s in this string.

On the first call, s should be empty string. a should be empty list and b should contains all the strings.

TheCrypt

Posted 2015-07-08T18:42:32.580

Reputation: 141

2You should use default arguments so that only the strings need to be supplied to the function, like f(b,s='',a=[]). – feersum – 2015-07-10T12:26:42.223