Wrap-Around Subsequences

11

Introduction

In this challenge, your task is to find generalized subsequences of strings. The subsequences are not necessarily contiguous, and they can also "wrap around" the string, going past its end and starting again from the beginning. You'll want to minimize the number of wraps, though.

More formally, let u and v be any two strings, and k ≥ 0 an integer. We say that u is a k-wrapping subsequence of v, if there are distinct indices i1, i2, ..., ilen(u) such that u == v[i1] v[i2] ... v[ilen(u)], and at most k of the indices ij satisfy ij > ij+1. This means that u can be found inside v by going from left to right, choosing some of its characters on the way, and wrapping around at most k times (equivalently, doing at most k+1 sweeps across v). Note that no character can be chosen more than once, even after a wrap-around, and that 0-wrapping subsequences are exactly the ordinary subsequences that we are all familiar with.

The Task

Your inputs are two non-empty alphanumeric strings u and v, and your output is the smallest integer k such that u is a k-wrapping subsequence of v. If no such k exists, the output shall be -1.

Example

Consider the inputs u := xyzyxzzxyx and v := yxzzazzyxxxyz. If we start looking for the characters of u in v in a greedy fashion, we will wrap around 3 times:

 yxzzazzyxxxyz
>─x─────y────z┐
┌─────────────┘
└y───────x────┐
┌─────────────┘
└──zz─────x─y─┐
┌─────────────┘
└──────────x──>

Thus the correct output is at most 3. Note how the left-most character x is selected once, and then ignored on the second sweep, since it cannot be re-used. However, there exists a shorter method with only 2 wrap-arounds:

 yxzzazzyxxxyz
>──────────xyz┐
┌─────────────┘
└yxzz────x────┐
┌─────────────┘
└───────y─x───>

It turns out that one wrap-around (that is, two sweeps) is not enough, so the correct output is 2.

Rules and Bonuses

You can write either a function or a full program, and you can also change the order of the inputs if needed. The lowest byte count wins, and standard loopholes are disallowed.

There is a bonus of -10% for computing all of the test cases in under 10 seconds total. I will test unclear cases on my machine; my reference implementation in Python takes about 0.6 seconds. I have a 7-year old laptop with 1.86 GHz dual core CPU, which you may want to take into account.

Test Cases

"me" "moe" -> 0
"meet" "metro" -> -1
"ababa" "abaab" -> 1
"abaab" "baabaa" -> 1
"1c1C1C2B" "1111CCCcB2" -> 3
"reverse" "reserved" -> 2
"abcdefg" "gfedcba" -> 6
"xyzyxzzxyx" "yxzzazzyxxxyz" -> 2
"aasdffdaasdf" "asdfddasdfsdaafsds" -> 2

Zgarb

Posted 2015-04-12T13:13:13.173

Reputation: 39 083

1

Would this also be a valid solution for the example? It's a greedy approach.

– orlp – 2015-04-12T16:59:00.703

@orlp It is not valid, because the first x is used in three distinct sweeps. It can only be used once. – Zgarb – 2015-04-12T17:01:13.990

Ahhh, I see now. – orlp – 2015-04-12T17:15:44.700

Answers

4

Pyth, 34 bytes

Mh+Smssm>.ukC,dtdfqGsm@HkT.PUHlG_1

This defines a function g, which takes two strings as parameter. Try it online: Pyth Compiler/Executor

This code is very inefficient. It has a time and memory complexity of len(v)!/(len(v)-len(u))!. It is not able to solve the longer test cases in under 10 seconds. (It also will crash very likely, since it will run out of memory.)

M                                    define g(G, H): return _
                          .PUHlG        all permutations of [0, 1, ..., len(H)-1] of length len(G)
                 fqGsm@HkT              filter the permutations which form the string G
    mssm>.ukC,dtd                       compute the number of wraps for each of the remaining permutations
  +S                            _1      sort the numbers and append -1
 h                                      return the first element

Jakube

Posted 2015-04-12T13:13:13.173

Reputation: 21 462

4

Haskell, 160 * 0.9 = 144 bytes

a#(-1)=a
a#b=min a b
f y=w(y++" ")0$length y
w _ n _[]=n
w(c:d)n o g@(a:b)|n>o=(-1)|a==c=z#w y n z g|c==' '=w y(n+1)o g|1<2=w y n o g where z=w d n o b;y=d++[c]

Timing for all test cases (note: arguments are flipped):

*Main> map (uncurry f) [
             ("moe", "me"),
             ("metro", "meet"),
             ("abaab", "ababa"),
             ("baabaa", "abaab"),
             ("1111CCCcB2", "1c1C1C2B"),
             ("reserved", "reverse"),
             ("gfedcba", "abcdefg"),
             ("yxzzazzyxxxyz", "xyzyxzzxyx"),
             ("asdfddasdfsdaafsds", "aasdffdaasdf")]
[0,-1,1,1,3,2,6,2,2]
(0.08 secs, 25794240 bytes)

How it works (short version): simple brute force that takes the minimum of using a matching character and skipping it. I stop the search when either finished (returning the number of cycles) or when cycled more than the minimum so far (returning -1).

Saved a lot of bytes compared to my first version, mainly because I switched from a full program to a function.

With some comments and proper spacing golfed Haskell is quite readable:

-- a minimum function that ignores a -1 in the right argument to prevent
-- "not solvable" cases in parts of the recursive search to dominate low numbers
-- of solvable parts. If the case isn't solvabale at all, both arguments are
-- -1 and are carried on.
a # (-1) = a
a # b    = min a b

-- the main function f calls the worker funktion w with arguments
-- * the string to search in (STSI), appended by a space to detect cycles
-- * the number of cycles so far
-- * the minimum of cycles needed so far, starting with the length of STSI
-- * the string to search for (STSF) (partial applied away and therefore invisible)
f y = w (y++" ") 0 (length y)

-- the worker function 
w _ n _ [] = n          -- base case: if STSF is empty the work is done and the 
                        -- number of cycles is returned

w (c:d) n o g@(a:b)     -- "c" is first char of STSI, "d" the rest
                        -- "n" number of cycles, "o" minimum of cycles so far
                        -- "g" is the whole STSF, "a" the 1st char, "b" the rest
  | n>o    = (-1)             -- if current cycle is more than a previous result,
                              -- indicate failure
  | a==c   = z # w y n z g    -- if there's a character match, take the min of
                              -- using it and skipping it
  | c==' ' = w y (n+1) o g    -- cycle detected, repeat and adjust n
  | 1<2    = w y n o g        -- otherwise try next char in STSI

  where                 -- just some golfing: short names for common subexpressions
  z = w d n o b;        -- number of cycles if a matching char is used
  y = d ++ [c]          -- rotated STSI

For reference: old version, full program, 187 bytes

main=interact$show.f.lines
a#(-1)=a
a#b=min a b
f[x,y]=w x(y++" ")0 0
w[]_ n _=n
w g@(a:b)(c:d)n m|a==c=w b d n 1#y|c==' '&&m==1=w g(d++" ")(n+1)0|c==' '=(-1)|1<2=y where y=w g(d++[c])n m

nimi

Posted 2015-04-12T13:13:13.173

Reputation: 34 639

@Zgarb: reworked my solution. It's now faster and shorter. – nimi – 2015-04-13T18:07:42.217

Runs in 0.6s when interpreted, 0.01s when compiled. – Zgarb – 2015-04-13T18:11:00.150

2

JavaScript (ES6) 174 (193 - 10%)

Recursive search, like the @nimi's answer, keeping the min of wraps. The solutions space is big (above all for the last example) but cutting the search at the min currently found keeps the time low. Edit 1 Add a missing test case, shortened a bit Edit 2 No need to pass param w around, it's fixed

K=(w,s,x)=>
  ~-(R=(r,l,p=0,q=1,z=w[p],i=0)=>
  {
    if(z&&!(q>x)){
      if(~(r+l).indexOf(z))
        for(t=l?R(l+r,'',p,q+1):x;x<t?0:x=t,i=~r.indexOf(z,-i);)
          t=R(r.slice(-i),l+r.slice(0,~i),p+1,q);
      q=x
    }
    return q
  })(s,'')

Ungolfed

K=(word, astring)=>
{
  var minWraps // undefined at first. All numeric comparison with undefined give false 
  var R=(right, left, pos, wraps)=>
  {
    var cur = word[pos]
    var i,t;
    if (! cur) // when all chars of word are managed
      return wraps;
    if (wraps > minWraps) // over the minimum wrap count already found, stop search
      return wraps; 
    if ( (right+left).indexOf(cur) < 0 ) // if the current char is not found in the remaining part of the string
      return minWraps; // return the current min, could still be undefined (that means 'no way')
    if ( left ) // if there is a left part, try a wrapping search with the current char
    {
      t = R(left+right, '', pos, wraps+1)
      if ( !(minWraps < t)) minWraps = t; // set current min if t is less than current min or current min is still undefined
    }
    // find all occurrences of current char in the remaining part
    // for each occurrence, start a recursive search for the next char
    for(i = 0; (i = right.indexOf(cur, i)) >= 0; i++)
    {
      var passed = right.slice(0,i) // the passed chars go in the left part
      var rest = right.slice(i+1) 
      t = R(rest, left+passed, pos+1, wraps) // try next char in the remaining part, no wrap
      if ( !(minWraps < t)) minWraps = t; // set current min if t is less than current min or current min is still undefined
    }
    return minWraps
  }
  var result = R(astring, '', 0, 1) // start with right=string and left empty
  return ~-result; // decrement. convert undefined to -1
}

Test In Firefox/FireBug console

time=~new Date;
[['me','moe']
,['meet','metro']
,['ababa','abaab']
,['abaab','baabaa']
,['1c1C1C2B','1111CCCcB2']
,['reverse','reserved']
,['abcdefg','gfedcba']
,['xyzyxzzxyx','yxzzazzyxxxyz']
,['aasdffdaasdf','asdfddasdfsdaafsds']]
.forEach(s=>console.log(s,r=K(...s)))
time-=~new Date

Output (last line is the execution time in ms)

["me", "moe"] 0
["meet", "metro"] -1
["ababa", "abaab"] 1
["abaab", "baabaa"] 1
["1c1C1C2B", "1111CCCcB2"] 3
["reverse", "reserved"] 2
["abcdefg", "gfedcba"] 6
["xyzyxzzxyx", "yxzzazzyxxxyz"] 2
["aasdffdaasdf", "asdfddasdfsdaafsds"] 2
116

edc65

Posted 2015-04-12T13:13:13.173

Reputation: 31 086

Tested with Firebug, runs in 175ms on my machine. – Zgarb – 2015-04-13T16:49:49.280

@Zgarb then there is room for improvements: I'll try to make it slower and shorter – edc65 – 2015-04-13T17:16:10.110