Shortest, lexicographically smallest generating string

17

A string x generates a string y if y is a substring of an infinite repeat of x. For example abc generates bcabcab.

Write a program to find the shortest, lexicographically smallest string that will generate the input. You are given on standard input a single line of text. You should print the generating string to standard output. For example:

input

bcabcabca

output

abc

Shortest code wins. You may assume the input contains only the characters a-z (and a trailing newline if you want).

Keith Randall

Posted 2011-06-15T16:36:31.953

Reputation: 19 865

Output should be in any order? Say output can be bac in your example rather than abc? – Ant's – 2011-06-15T17:10:44.313

@GroovyUser: no, the input is not a substring of a repeated pattern of bacs. – Keith Randall – 2011-06-15T17:28:18.750

But the input could consist of a substring of (bca)^n, which means bca is just as valid for the given example as abc. – JAB – 2011-06-16T18:12:50.720

1@JAB: bca is not the smallest lexicographically. – Keith Randall – 2011-06-16T20:47:02.673

Ah, I somehow missed that part. – JAB – 2011-06-17T13:40:27.443

Answers

9

Ruby 1.9, 40 characters

gets;a=?a;a.next!until(a*~/$/)[$_];$><<a

Assumes the input is not terminated by a newline. Also it's probably ridiculously slow for larger results.

$ echo -n "bcabcabca" | ruby genlex.rb 
abc
$ echo -n "barfoobarfoobarfoo" | ruby1.9 genlex.rb 
arfoob

Ventero

Posted 2011-06-15T16:36:31.953

Reputation: 9 842

2

Python 88 185 chars

import re
s=raw_input()
m=s.index(min(s))
s=s[m:]+s[:m]
i=0
while s.replace(s[:i],''):i+=1
m=min(s[:i])
s=re.findall('%s[\w]*?(?=%s|$)'%(m,m),s[:i])
m=s.index(min(s))
print ''.join(s[m:]+s[:m])

Output:

bcabcabca
abc

aaa
a

abc
abc

cccbbcccbbcccbb
bbccc

barfoofoobarfoofoo
arfoofoob

bacabac
abacbac

Vader

Posted 2011-06-15T16:36:31.953

Reputation: 441

Doesn't give you the lexicographically smallest string for some inputs, e.g. "bacabac" – Howard – 2011-06-16T03:23:41.287

@Howard You are right. I've updated my code, it is much longer now, but handles strings like bacabac correctly. – Vader – 2011-06-16T22:24:14.023

"abac" would be correct, see @yogsototh's answer: abacabacabac. – Howard – 2011-06-17T05:02:45.307

2

Haskell, 299 128 characters

import Data.List
main=interact(\z->minimum$filter(\w->isInfixOf z$concat$replicate(length z)w) $filter((/=)"")$inits=<<tails z)

Thanks to jloy! Now the version is both far shorter and I believe correct.

yogsototh

Posted 2011-06-15T16:36:31.953

Reputation: 211

1

So, the good news is that it's possible to golf this solution down to about 91 chars if you accept input on stdin as in Ventero's Ruby solution. Unfortunately, the input cabcabcabc produces abcabc, so this solution isn't quite there. I think you'll need to modify q++q++q in order to get the desired result. My quick attempt at fixing bumped things back up to 145 chars though. (Spoilers are here: https://gist.github.com/1035161)

– None – 2011-06-20T05:14:08.867

Thanks! I didn't knew about interact nor never though about inits <<= tails to get all substrings. I slightly modified your version to gain a bit of characters. I removed sort and changed filter(not.null) by filter((/=)""). Thanks again! – yogsototh – 2011-06-20T12:01:59.147

Why do you need (/=)"" condition? It doesn't seem to do anything. Also, getting rid of lambdas helps: you can get rid of w altogether using . operator and change main function to main=interact s to save a couple of characters. – Rotsor – 2011-06-21T15:06:12.647

I think the answer for "bca" is wrong. It should be "abc", but is "bca" now. – Rotsor – 2011-06-21T15:14:26.943

One possible solution is to use permutations instead of tails. – Rotsor – 2011-06-21T15:25:38.730

@Rostor is correct about the null filter. That was left over from an brief experiment with cycle and shouldn't be necessary. – None – 2011-06-26T01:40:30.200

2

Python, 161 159 166 140 141 134 132 chars

y=raw_input();i=n=l=len(y)
while i:
 if (y[:i]*l)[:l]==y:n=i
 i-=1
x=y[:n];y=x*2
while i<n:
 x=min(x,y[i:i+n])
 i+=1
print x

EDIT : Golfed the code after reading Jules Olléon's comment. Removed a 'bug' that bcdabcdab results in abbc.

EDIT2 : Fixed the bug (abaa results in aaa) spotted by Jules Olléon.

I don't know about Python well, so this code is probably 'not golfed'.

I love this rule:

You may assume the input contains only the characters a-z...

Inputs & Outputs

bcdabcd
abcd

bcabcabca
abc


abcdabcd
abcd

bcdabcdab
abcd

barfoofoobarfoofoobar
arfoofoob

cccbbcccbbcccbb
bbccc

aaaaaaaaaaaaaaaa
a

thequickbrownfox
brownfoxthequick

ababa
ab

abaa
aab

JiminP

Posted 2011-06-15T16:36:31.953

Reputation: 3 264

1Brown fox, the quick! Dog, the lazy! – JiminP – 2011-06-20T13:04:00.230

Nice solution, pretty short and probably the best complexity here! You could golf it a bit - for instance, you dont need "int" to compare strings; and replace "while i>0" by "while i" and "y=y+y" by "y*=2". – Jules Olléon – 2011-06-23T14:27:18.753

Actually there's a problem : for abaa it prints aaa... – Jules Olléon – 2011-06-23T16:09:51.910

@Jules Thanks for the comment! I didn't think about that one... – JiminP – 2011-06-23T16:18:07.313

You can do i-=1 instead of i=i-1. Likewise for the increment. – Lowjacker – 2011-06-26T11:13:00.763

2

Python, 121 137 129 chars

s=raw_input()
n=len(s)
l=[(s+s)[i/n:i/n+i%n+1]for i in range(n*n)]
print min(filter(lambda x:(x*len(s)).find(s)+1,sorted(l)),key=len)

EDIT: fixed the bug spotted by JiminP

Jules Olléon

Posted 2011-06-15T16:36:31.953

Reputation: 731

Wow, it's great! Unfortunately, it prints aabab for string ababa... :( – JiminP – 2011-06-23T14:52:34.420

Ok, fixed... it's getting longer :( – Jules Olléon – 2011-06-23T16:04:07.540

2

Ruby 1.9, 36

$><<(?a..gets).find{|s|(s*~/$/)[$_]}

Uses the same approach as Ventero's solution.

Lowjacker

Posted 2011-06-15T16:36:31.953

Reputation: 4 466

1

Mathematica 124 bytes

x = StringLength@(y = "");
For[i = 1, ! (s = y~StringTake~i)~StringRepeat~x~StringContainsQ~y,i++];
First@Sort@StringPartition[s <> s, i, 1]

Whitespace and newlines (in the presence of semicolons at the ends of lines) have no meaning in Mathematica and are included here for readability.

Input goes in between the quotation marks in the first line. If recast as a function, that takes string input like so:

f=(x=StringLength@(y=#);For[i=1,!(s=y~StringTake~i)~StringRepeat~x~StringContainsQ~y,i++];First@Sort@StringPartition[s<>s,i,1])&

f@"bca"

(* "abc" *)

f@"abaa"

(* "aab" *)

then it's 128 bytes.

The For loop takes the first i characters of the input and repeats them at least up to the length of the input, then checks if the input is a substring of the result. Having found the length of the period of the string, the StringPartition command concatenates two copies of that period and takes all substrings of that length from it (basically gets all cyclic permutations), then First@Sort finds the first one of them when lexicographically ordered.

LLlAMnYP

Posted 2011-06-15T16:36:31.953

Reputation: 361

0

PowerShell, 23 bytes

-join($args|group|% N*)

Try it online!

mazzy

Posted 2011-06-15T16:36:31.953

Reputation: 4 832

0

javascript 96 Chars.

var temp = {},len = str.length;
for(i in str) 
temp[str[i]] = true;
Object.keys(temp).join(""); 

Working Plunkr

ngLover

Posted 2011-06-15T16:36:31.953

Reputation: 101

1Welcome to the community ! I couldn't test your code though, could you provide either code reading from GET/POST and writing with alert or console.log or a function taking the input as a parameter and returning the output? – Aaron – 2015-10-20T09:02:46.653

@AaronGOUZIT added pluckr – ngLover – 2015-10-20T11:48:52.423

Thanks, that helps. Still, the code you posted can't be used alone, so that cheats the byte count. More importantly, I'm afraid your code doesn't respect the specs : I believe you return a set of unique letters used rather than a "generating string", which we should be able to repeat (as a whole) with optional truncation to obtain the input. I'm looking forward to seeing your updated code ! – Aaron – 2015-10-20T12:31:52.833