Find the optimal pattern

33

4

Given a string s composed of lowercase letters, such as

aabaaababbbbaaba

and a positive integer n, such as 4, output a length-n string t such that when t is repeated to the length of s, they have as many chars in common as possible. For the given example, the optimal output would be aaba, because it has thirteen chars in common with the target string:

s: aabaaababbbbaaba
t: aabaaabaaabaaaba (aaba)
   ^^^^^^^^  ^ ^^^^

and no possible t has more. However, for aaaaaab, there are two possible outputs: aaaa and aaba, which each have 6 chars in common with the target string:

s: aaaaaab
t: aaaaaaaa (aaaa)
   ^^^^^^ 

s: aaaaaab
t: aabaaaba (aaba)
   ^^ ^^^^

Either aaaa or aaba can be outputted, or both if you'd like. Note that s is not ever repeated; the trailing a in both repeated values of t is simply ignored.

Test cases

Inputs -> Valid outputs
1 a -> a
1 aa -> a
2 aa -> aa
1 ab -> a b
2 ab -> ab
1 abb -> b
2 abb -> ab bb
2 ababa -> ab
2 abcba -> ab
2 aabbbbb -> bb  (ab is not a valid output here)
3 aababba -> aab abb
3 aababbaa -> aab
3 asdasfadf -> asf
3 asdasfadfsdf -> asf adf
2 abcdefghijklmnopqrstuvwxyzyx -> yx
2 supercalifragilisticexpialidocious -> ic ii
3 supercalifragilisticexpialidocious -> iri ili ioi
4 supercalifragilisticexpialidocious -> scii
5 supercalifragilisticexpialidocious -> iapic
2 eeeebaadbaecaebbbbbebbbbeecacebdccaecadbbbaceebedbbbddadebeddedbcedeaadcabdeccceccaeaadbbaecbbcbcbea -> bb be
10 bbbbacacbcedecdbbbdebdaedcecdabcebddbdcecebbeeaacdebdbebaebcecddadeeedbbdbbaeaaeebbedbeeaeedadeecbcd -> ebbbdbeece ebdbdbeece
20 aabbbaaabaaabaaaabbbbabbbbabbbabbbbbabbaaaababbbaababbbaababaaaabbaaabbaabbbabaaabbabbaaabbaaaaaaaba -> aabbbbaaabbabbbaabba

Rules

  • You may assume the input will only ever be a non-empty string of lowercase letters and a positive integer no greater than the length of the string.
  • You may take the inputs in any standard format and in either order.
  • You may output a single string, or more than one in the form of an array, separated by newlines or spaces, etc.
  • Your code must finish for each test-case in less than 1 minute on any fairly modern computer.
  • This is , so make your code as short as possible.

ETHproductions

Posted 2017-03-22T16:16:48.477

Reputation: 47 880

2This challenge is Zgarb-quality. Nice work! – Martin Ender – 2017-03-22T16:39:47.650

I'm assuming only trailing characters are ignored? So you aren't allowed to ignore leading characters like this: 2 abb -> ba where it's built up as (b)[ab]a: leading (b) is ignored, [ab] are matching. – Kevin Cruijssen – 2017-03-23T10:50:13.673

@KevinCruijssen Right, the pattern must start repeating at the beginning. – ETHproductions – 2017-03-23T12:05:01.990

Answers

11

Jelly, 11 bytes

sZµṢŒrUṀṪµ€

Try it online!

Wasn't expecting to beat Dennis on this one, so tried to FGITW it (after trying several possibilities; there's more than one way to make 11). I came in shorter, much to my surprise.

Takes the string then the count as command-line arguments. Outputs on stdout.

Explanation

sZµṢŒrUṀṪµ€
s            Split {the first input} into {the second input}-sized groups
 Z           Transpose
  µ      µ€  On each of the transposed groups:
   Ṣ           Sort it;
    Œr         Run-length encode it;
      U        Rearrange it to the form {count, letter};
       Ṁ       Take the largest element (i.e. largest count)
        Ṫ      Take the second element of the pair (i.e. just the letter)

This uses the insight that the letter in each position of the pattern must be the most common letter corresponding to that position. We can find the letters corresponding to a particular pattern via splitting into pattern-sized groups, and transposing. The main reason this solution is so long is that Jelly doesn't seem to have a short way to find the mode of a list (I made several attempts, but they're all at least six bytes long).

Jelly, 10 bytes, based on @Dennis' solution

⁸ċ$ÞṪ
sZÇ€

Try it online!

This is a combination of @Dennis' solution and my own; there was a five-byte mode in that solution, which I stole for this solution. (I already had solutions based on ⁸ċ, but couldn't get below six bytes with it; I hadn't thought of using Þ.)

Explanation

µ…µ€ and Ç€ (with the on the previous line) are both three bytes long (the latter needs a newline), and equivalent. Normally I use the former, but the latter's more flexible, as it allows you to use to mention the argument.

This makes it possible to sort (Þ) by the number of occurrences in (⁸ċ), then take the last element (), to find the mode in just five characters.

user62131

Posted 2017-03-22T16:16:48.477

Reputation:

5Nice job outgolfing Dennis with his own language! :P – HyperNeutrino – 2017-03-22T17:57:34.507

10

Mathematica, 51 bytes

#&@@@Commonest/@(PadRight@Partition[#2,UpTo@#])&

Input and output are lists of characters.

Also based on the modes of the lines of the transpose. I believe they called the built-in for the mode of a list Commonest solely to spite code golfers.

Martin Ender

Posted 2017-03-22T16:16:48.477

Reputation: 184 808

At least it's a byte shorter than MostCommon... – ETHproductions – 2017-03-23T17:41:40.857

7

Python 3, 99, 73 61 bytes

-12, thx to @Rod

lambda s,n:''.join(max(s,key=s[i::n].count)for i in range(n))

Same idea, but rewrote it to eliminate the import statement.

lambda s,n:''.join(max(s,key=lambda c:s[i::n].count(c))for i in range(n))

Original

from collections import*
lambda s,n:''.join(Counter(s[i::n]).most_common(1)[0][0]for i in range(n))

Explanation:

s[i::n]                  a slice of every nth character of s, starting at position i

Counter(s[i::n])         counts the characters in the slice
  .most_common()         returns a list of (character, count) pairs, sorted by decreasing count
    [0][0]               grabs the letter from the first pair (i.e., the most common letter
      for i in range(n)  repeat for all starting positions

''.join                  combines the most common letters into a single string

RootTwo

Posted 2017-03-22T16:16:48.477

Reputation: 1 749

you can switch to python2.7 and drop the ''.join() to return a list of strings – Rod – 2017-03-22T19:10:00.677

@Rod Dropping ''.join(...) would make it return a generator, not sure if that's allowed output. – L3viathan – 2017-03-22T19:11:23.767

@L3viathan it need to be python2.7 to work, added to the other comment – Rod – 2017-03-22T19:22:51.377

Can you write some explanation of how this works? – Dead Possum – 2017-03-22T19:32:35.317

Also you can replace key=lambda c:s[i::n].count(c) with key=s[i::n].count (probably) – Rod – 2017-03-22T19:36:07.090

2@Rod A list of strings is only allowed in the question for if you return all possible solutions. That's what I took it to mean. – mbomb007 – 2017-03-22T21:49:06.933

5

Python 2, 106

Now it's a different answer! I was thinking about one(almost)-liner from the beggining. Now even shorter, based on zip usage by @Rod.

Thanks to @L3viathan and @Rod for clarification about using lambdas as answer

Try it online

lambda S,N:max(combinations(S,N),key=lambda s:sum(x==y for x,y in zip(S,s*len(S))))
from itertools import*

Explanation:

combinations(S,N) creates all combinations length N from characters of S

max() have argument key which takes as input function to use to compare elements

lambda s:sum(x==y for x,y in zip(S,s*len(S))) passed as such function

This lambda counts number of matching characters in list of tuples, produces by zip(S,s*len(S))

s - one of combinations and it's multipled by len(S) which creates string that is guaranteed longer than S

zip creates tuples of characters of each string S and s*len(S) and ignores all characters that can't be matched (in case of one string longer than another)

So max chooses combination, that produce maximum sum

Dead Possum

Posted 2017-03-22T16:16:48.477

Reputation: 3 256

1you don't need to use [] on list comprehension inside functions, also you are using 1 for ... if <cond> you can use directly <cond> for ... since it will be used on sum, python will take True as 1 and False as 0 – Rod – 2017-03-22T17:09:56.097

@Rod Thank you! If I would sqeeze my answer more, it will transform into your answer, approach is same :D So I'm trying something different right now – Dead Possum – 2017-03-22T17:15:11.210

Yep, just saying so you can use on your future answers :3 – Rod – 2017-03-22T17:25:58.260

@Rod I'm still learning python and find that PPCG is a good place to learn tricks and little nuances. At least for python. Btw, I've just beat you :D – Dead Possum – 2017-03-22T17:36:13.667

¯\(ツ) – Rod – 2017-03-22T17:54:50.180

1Switching to a lambda will save 7 bytes. – L3viathan – 2017-03-22T19:09:33.963

@L3viathan and it will not output anything. Would it be valid entry? – Dead Possum – 2017-03-22T19:13:41.847

1

@DeadPossum He meant this (note the footer and the header) and yes, a function is a valid answer, if its a lambda you don't even need the f= (unless it's recursive)

– Rod – 2017-03-22T19:21:43.107

@Rod Oh, that's great! Not intuitive, but I'll accept that :) – Dead Possum – 2017-03-22T19:24:51.537

5

JavaScript (ES6), 104 101 94 bytes

(n,s)=>s.replace(/./g,(_,i)=>[...s].map((c,j,a)=>j%n-i||(a[c]=-~a[c])>m&&(m++,r=c),m=r=``)&&r)

Saved 3 bytes twice thanks to @Arnauld. 97-byte solution that works with all non-newline characters:

(n,s)=>s.replace(/./g,(_,i)=>[...s].map((c,j)=>j%n-i||(o[c]=-~o[c])>m&&(m++,r=c),m=r=``,o={})&&r)

The previous 104-byte solution works with newline characters too:

(n,s)=>[...Array(n)].map((_,i)=>[...s].map((c,j)=>j%n-i||(o[c]=-~o[c])>m&&(m++,r=c),m=0,o={})&&r).join``

Neil

Posted 2017-03-22T16:16:48.477

Reputation: 95 035

Very nice. I golfed a solution for reference when adding test cases and came up at 122 bytes, looping through every char, saving the counts in an array of objects, then building the string from that array. – ETHproductions – 2017-03-22T17:41:29.363

Rather than initializing o to a new object, could you just reuse the array passed to map by using its 3rd parameter? – Arnauld – 2017-03-23T11:05:10.163

@Arnauld Hmm, I guess that works because the question guarantees lowercase letters, so I won't confuse the array elements with the counts... – Neil – 2017-03-23T11:15:53.947

I think (n,s)=>s.replace(/./g,(_,i)=>i<n?[...s].map((c,j,a)=>j%n-i||(a[c]=-~a[c])>m&&(m++,r=c),m=0)&&r:'') should save 3 more bytes. (Or 4 bytes by using currying syntax.) – Arnauld – 2017-03-23T11:55:14.267

@Arnauld Not bad, but I shaved a further two bytes off. (And also fixed my byte counts; a trailing newline was throwing them off.) – Neil – 2017-03-23T12:44:57.913

... and another two bytes bite the dust. – Neil – 2017-03-23T12:47:03.577

... or perhaps I should say they byte the dust... – Neil – 2017-03-23T15:00:25.447

3

Jelly, 12 11 bytes

s@ZċþZMḢ$€ị

Try it online!

How it works

s@ZċþZMḢ$€ị  Main link. Arguments: n (integer), s (string)

s@           Split swapped; split s into chunks of length n.
  Z          Zip/transpose, grouping characters that correspond to repetitions.
   ċþ        Count table; for each slice in the previous result, and each character
             in s, count the occurrences of the character in the group.
             This groups by character.
     Z       Zip/transpose to group by slice.
        $€   Map the two-link chain to the left over the groups.
      M        Find all maximal indices.
       Ḣ       Head; pick the first.
          ị  Index into s to retrieve the corresponding characters.

Dennis

Posted 2017-03-22T16:16:48.477

Reputation: 196 637

Does Jelly have comments? – caird coinheringaahing – 2017-03-23T22:05:26.243

No, it does not. – Dennis – 2017-03-23T22:10:43.160

2

Pyth, 11 bytes

meo/dNd.TcF

Takes input as s,n and outputs as a list of characters.

Explanation

meo/dNd.TcF
         cFQ   Split s into chunks of length n.
       .T      Transpose.
m o/dNd        Sort characters in each string by frequency.
 e             Take the most common.

user48543

Posted 2017-03-22T16:16:48.477

Reputation:

2

Japt, 16 15 bytes

Saved 1 byte thanks to @obarakon

Ç=VëUZ)¬ñ!èZ o

14 bytes of code + 1 byte for the -P flag. Try it online!

Ungolfed and explanation

 Ç   =VëUZ)¬ ñ!èZ o
UoZ{Z=VëUZ)q ñ!èZ o}
                          Implicit: U = input number, V = input string
Uo                        Create the range [0...U).
  Z{               }      Map each item Z by this function:
      VëUZ                  Take every U'th char of V, starting at index Z.
    Z=    )                 Call the result Z.
           q                Split the result into chars.
             ñ!èZ           Sort each char X by the number of occurrences of X in Z.
                  o         Pop; grab the last item (the most common char).
                      -P  Join the results (array of most common chars) into a string.

ETHproductions

Posted 2017-03-22T16:16:48.477

Reputation: 47 880

I think you can replace gJ with o – Oliver – 2017-03-22T18:53:38.427

@obarakon That's genius, thanks! – ETHproductions – 2017-03-22T19:09:41.750

1

Python 2, 132 bytes

from itertools import*
p,k=input()
b=l=len(p)
for i in combinations(p,k):
 x=sum(x!=y for x,y in zip(p,i*l))
 if x<b:b,o=x,i
print o

Try it online!

Rod

Posted 2017-03-22T16:16:48.477

Reputation: 17 588

1

05AB1E, 17 bytes

Iôð«øvy{.¡é®èÙJðÜ

Try it online!

Explanation

Iô                 # split 2nd input in chunks of 1st input size
  ð«               # append a space to each
    ø              # zip
     vy            # for each y in the zipped list
       {           # sort the string
        .¡         # group into chunks of consecutive equal elements
          é        # sort by length
           ®è      # pop the last element (the longest)
             Ù     # remove duplicate characters from the string
              J    # join the stack into one string
               ðÜ  # remove any trailing spaces

Emigna

Posted 2017-03-22T16:16:48.477

Reputation: 50 798

1

PHP, 245 Bytes

function p($c,$s,$r=""){global$a;if(($c-strlen($r)))foreach(str_split(count_chars($s,3))as$l)p($c,$s,$r.$l);else{for($v=str_pad("",$w=strlen($s),$r);$z<$w;)$t+=$v[$z]==$s[$z++];$a[$t][]=$r;}}p($argv[1],$argv[2]);ksort($a);echo join(" ",end($a));

Online Version

Breakdown

function p($c,$s,$r=""){
    global$a;
    if(($c-strlen($r)))  # make permutation
        foreach(str_split(count_chars($s,3))as$l)
            p($c,$s,$r.$l); #recursive
    else{
        for($v=str_pad("",$w=strlen($s),$r);$z<$w;) 
        $t+=$v[$z]==$s[$z++]; #compare strings
        $a[$t][]=$r; # insert value in array
    }
}
p($argv[1],$argv[2]); #start function with the input parameter
ksort($a); # sort result array 
echo join(" ",end($a)); #Output

Jörg Hülsermann

Posted 2017-03-22T16:16:48.477

Reputation: 13 026

1

Haskell, 84 bytes

import Data.Lists
f n=map(argmax=<<(length.).flip(filter.(==))).transpose.chunksOf n

Usage example:

f 10 "bbbbacacbcedecdbbbdebdaedcecdabcebddbdcecebbeeaacdebdbebaebcecddadeeedbbdbbaeaaeebbedbeeaeedadeecbcd"
"ebbbdbeece"

Split input string into chunks of length n, transpose and ffind for each sublist the most frequent element.

nimi

Posted 2017-03-22T16:16:48.477

Reputation: 34 639

1

Röda, 68 bytes

f s,n{seq 0,n-1|{|i|sort s/"",key={|c|x=s[i::n]x~=c,"";[#x]}|head}_}

Try it online!

It's a function that prints the output without trailing newline.

This was inspired by this answer.

fergusq

Posted 2017-03-22T16:16:48.477

Reputation: 4 867