Longest Common Subsequence of Two Strings containing a particular Substring

6

A subsequence is a sequence that can be derived from another sequence by deleting some elements without changing the order of the remaining elements.

Given three strings A, B and C (note the order) as input, find the length of the longest string S such that:

  • Condition 1: S is a subsequence of A
  • Condition 2: S is a subsequence of B
  • Condition 3: C is a substring of S

.

Testcase #1

Input:

ABCDEF

CDEFAB

B

Output:

2

Testcase #2

Input:

HELLO

WORLD

SUCKS

Output:

0

Testcase #3

Input:

ABCXD

BCDEF

CD

Ouput:

3

Testcase #4

Input:

X

Y

Z

Ouput:

0

Note:

  • The empty string is regarded as a subsequence as well as a substring of every string.
  • Assume all input strings are in uppercase.
  • Return 0 as output if any of the first two conditions (Condition 1 and Condition 2) is not going to be satisfied using Condition 3.

Constraints:

  • Assume the lengths of A, B and C will always be between 1 and 3000 (both inclusive)

Winner:

One with the shortest code.

Coding man

Posted 2014-02-09T05:02:24.613

Reputation: 1 193

Either your spec is wrong or your sole test case is wrong. This question needs a definition (or at least a reference to a definition) of subsequence, and to be a good question it also needs test cases with better coverage (at a minimum, where S is not a substring of A or B). – Peter Taylor – 2014-02-09T08:46:27.037

@PeterTaylor Why do you think the example is wrong? AB is the longest subsequence which fulfills all three conditions. (It also happends to be a substring of A and B but that doesn't exclude it from the set of solutions.) I agree that the choice of the test case is not sufficient for any case but at least it isn't wrong. – Howard – 2014-02-09T10:04:01.800

My apologies, I misread the test case and got it into my mind that the third parameter was C. As @Howard says, it is correct. – Peter Taylor – 2014-02-09T14:51:55.777

1Why would case#2 and #4 return 0? Error fallback in case there's no matching sequence? 'SUCKS' isn't a substring of any subsequence, and even if as you say '' is a substring of every string/subsequence, there's nothing saying that '' should be compared...? – Joachim Isaksson – 2014-02-09T15:24:55.767

@JoachimIsaksson My mistake. Return 0 as output if any of the first two conditions (Condition 1 and Condition 2) is not going to be satisfied using Condition 3. – Coding man – 2014-02-09T15:55:59.540

Answers

4

APL (45)

⌈/0,⍴¨m/⍨∨/¨⍞∘⍷¨m←⊃∩/{/∘⍵¨↓⍉(z/2)⊤⍳2*z←⍴⍵}¨⍞⍞

Explanation:

  • ¨⍞⍞: read two lines. For each of them:
    • ⍳2*z←⍴⍵: get the numbers from 1 up to 2^length.
    • (z/2)⊤: encode each number as bits
    • ↓⍉: separate each string of bits into its own array
    • /∘⍵¨: and use each of them as a bitmask on the string, giving the subsequences
  • m←⊃∩/: take the intersection of the two arrays of subsequences, store in m
  • ∨/¨⍞∘⍷¨: read a third line, and see in which subsequences it occurs
  • m/⍨: select those elements from m that contained the third line
  • ⍴¨: get the length of each remaining subsequence
  • 0,: add a zero entry in case the resulting list is empty
  • ⌈/: get the highest number in the list

Test cases:

      ⌈/0,⍴¨m/⍨∨/¨⍞∘⍷¨m←⊃∩/{/∘⍵¨↓⍉(z/2)⊤⍳2*z←⍴⍵}¨⍞⍞
ABCDEF
CDEFAB
B
 2 
      ⌈/0,⍴¨m/⍨∨/¨⍞∘⍷¨m←⊃∩/{/∘⍵¨↓⍉(z/2)⊤⍳2*z←⍴⍵}¨⍞⍞
HELLO
WORLD
SUCKS
 0
      ⌈/0,⍴¨m/⍨∨/¨⍞∘⍷¨m←⊃∩/{/∘⍵¨↓⍉(z/2)⊤⍳2*z←⍴⍵}¨⍞⍞
ABCXD
BCDEF
CD
 3 
      ⌈/0,⍴¨m/⍨∨/¨⍞∘⍷¨m←⊃∩/{/∘⍵¨↓⍉(z/2)⊤⍳2*z←⍴⍵}¨⍞⍞
X
Y
Z
 0

marinus

Posted 2014-02-09T05:02:24.613

Reputation: 30 224

Yep, soundly beat it was, I can't even figure out how to type those characters... :) +1 – Joachim Isaksson – 2014-02-09T16:24:55.100

4

SWI-Prolog, 166

s(A,[H|R]):-(A=[H|T],s(T,R);s(A,R)).
s([],[]).
r(A,B,C,O):-(r(A,B,C,0,O),!;O=0).
r(A,B,C,M,O):-s(S,A),s(S,B),append([_,C,_],S),length(S,L),L>M,!,(r(A,B,C,L,O),!;L=O).

Usage:

?- r("abcxd","bcdef","cd",O).
O = 3.

n̴̖̋h̷͉̃a̷̭̿h̸̡̅ẗ̵̨́d̷̰̀ĥ̷̳

Posted 2014-02-09T05:02:24.613

Reputation: 5 683

3

Brachylog, 12 bytes, language postdates challenge

k⊇ᵐ=h.s~t?∨0

Try it online!

This is a function. The TIO link uses a command-line argument that allows functions to be run like full programs.

Explanation

k⊇ᵐ=h.s~t?∨0
k             Take the input, apart from the last element.
 ⊇ᵐ           Find the longest subsequence of each element
   =          such that those subsequences are equal
    h         and that subsequence
      s       has a substring
       ~t?    that's the last element of the input.
     .        If you can, return the subsequence.
          ∨0  If all else fails, return 0.

user62131

Posted 2014-02-09T05:02:24.613

Reputation:

3

Python (181)

Will probably be beat soundly by more operator rich languages, but since I so far have nothing to beat... :)

EDIT: I originally read substring where it said subsequence, this is the subsequence version;

from itertools import*
def f(a,b,c):
 f=lambda x:set(''.join(z)for y in range(len(x))for z in combinations(x,y))
 return max(len(x)for x in f(a).intersection(f(b))if c in x or not x)

What it does is basically to create 2 sets of all possible subsequences of the string a and the string b, intersecting them to find the common ones.

It then simply finds all resulting strings that contain the string c and prints the length of the longest matching string.

Joachim Isaksson

Posted 2014-02-09T05:02:24.613

Reputation: 1 161

See my comment on the question. Because it didn't define subsequence, you've wasted your time. A string of length n has up to 2^n subsequences, and e.g. f("ABCDEF", "ACE", "C") should return 3 because "ACE" is a subsequence of "ABCDEF".

– Peter Taylor – 2014-02-09T08:54:07.800

@PeterTaylor Yes, quite possibly I'll need to update this, I read subsequence as substring. – Joachim Isaksson – 2014-02-09T08:58:32.797

Your entry has rich operators. But, like Mathematica, its operators (more precisely, functions) have long names. – DavidC – 2014-02-09T16:36:07.633

2

GolfScript, 45 characters

n/)`{?)}+\{['']\1/{`{1$\+}+%}/}/&\,{,}%0+$-1=

Takes input as three lines on STDIN. Take a test run here.

Short explanation of the code:

n/     # split the input at newline into three strings
)`           # take the last one and stringify it
{?)}+        # and build a code block to filter for any string
             # which has that one as a substring (for later use)
\{           # now loop over the remaining two strings and 
             # build their subsequences
  ['']       # start with the set of the empty string
  \1/{       # loop over each character of the string
    `{       # apply code block to each member of the set
      1$\+   # copy member and add the current character
    }+%      # -> set is updated with latest character
  }/
}/
&            # take the common subsequences
\,           # apply the filter block from above
{,}%         # from each string take length
0+           # add zero (if empty set was returned)
$-1=         # sort and take last element (largest)

Howard

Posted 2014-02-09T05:02:24.613

Reputation: 23 109

2

Haskell, 90 81

Haskell provides everything you need in Data.List:

s=subsequences
f a b c=maximum$map length$filter(isInfixOf c)$intersect(s a)(s b)

If only the function names were shorter :)

Mike Fairhurst

Posted 2014-02-09T05:02:24.613

Reputation: 121

You can get it to 81 bytes if you rename subsequences and remove the space before and after the =. – Kaya – 2014-02-10T05:12:45.357

1

Brachylog, 13 bytes

~c~gᵗ⟨⊇ᵛs⟩l|∧

Try it online!

I didn't bother scrolling down far enough to notice that ais had already answered in Brachylog before I wrote this, but it looks like his answer doesn't actually output the length. It's way shorter doing what it does (it could be golfed down to the 10-byte k⊇ᵛ.s~t?∨ with the verify metapredicate and by leaving the output in case of failure to simply default to 0, and then every other integer), but if you try to get it to output the length it ends up tying this (while still making a lot more sense than anything that handles a set number of arguments with ~c!): k⊇ᵛXs~t?∧Xl.∨

A remarkably confusing explanation of my original solution:

~c               A partition of
                 the input
    ᵗ            such that its last element
  ~g             is a list containing one element
  ~gᵗ            which is replaced with that element
     ⟨   ⟩       is a list of two elements
          l      such that the implicit variable the length of which
           |     is the output
     ⟨⊇ᵛ ⟩       is a subsequence of both elements of the first element of the partition
     ⟨  s⟩       and has the last element of the partition as a substring.
           |     If it's not possible to find such a subsequence,
            ∧    leave the output variable unconstrained, so it will default to 0.

~chᵗ⟨⊇ᵛs⟩l|∧ is one byte shorter, but provides an incorrect output when the second element of the input is a substring of a subsequence of the first, like this or this. ~ctᵗ⟨⊇ᵛs⟩l|∧ completely disregards the second element of the input.

Unrelated String

Posted 2014-02-09T05:02:24.613

Reputation: 5 300

1

Mathematica 150 125

f[{a_,b_,c_}]:=({z=Subsets[Characters@#]&,s};s=Select[""<>#&/@(z@a⋂z@b),!StringFreeQ[#,c]&];
If[s=={},0,StringLength[s[[-1]]]])
  • Characters converts a string into a list of characters.
  • Subsets finds all the subsequences (held as lists of characters).
  • s refers to the list of ALL common subsequences (not the same a S).
  • s[[-1]] is the last (and largest) common subsequence; it is equal to S as defined in the question.

Test Cases

f[{"ABCDEF", "CDEFAB", "B"}]

2


f[{"HELLO", "WORLD", "SUCKS"}]

0


f[{"ABCXD", "BCDEF", "CD"}]

3


f[{"X", "Y", "Z"}]

0

DavidC

Posted 2014-02-09T05:02:24.613

Reputation: 24 524

1

Perl, 122

chomp(($A,$B,$C)=<>);$A=~s/(.)/($1)?.*?/g;$_=join'',map{$$_}1..$#-and/$C/&&$%<($-=length)and$%=$-while$B=~/.*?$A/g;print$%

The program expects 3 strings on STDIN and prints a number to STDOUT. Here it is re-written as sub for easier testing:

use feature 'say';
sub L{
    ($A,$B,$C,$%)=(@_,0);
    $A=~s/(.)/($1)?.*?/g;
    $_=join'',map{$$_}1..$#-and/$C/&&$%<($-=length)and$%=$-while$B=~/.*?$A/g;
    $%
}
say L(qw/ABCDEF CDEFAB B/);
say L(qw/HELLO WORLD SUCKS/);
say L(qw/ABCXD BCDEF CD/);
say L(qw/X Y Z/);

and

perl LCSoTS.pl
2
0
3
0

So, 122 even with ugly byte-consuming chomping at the start. I hope no bugs crept in while I golfed it to this size. And not sure about scalability, but let it be the question of hardware :-).

user2846289

Posted 2014-02-09T05:02:24.613

Reputation: 1 541