Lyndon word factorization

11

1

Background

A Lyndon word is a non-empty string which is strictly lexicographically smaller than all its other rotations. It is possible to factor any string uniquely as the concatenation of Lyndon words such that these subwords are lexicographically non-increasing; your challenge is to do this as succinctly as possible.

Details

You should implement a function or program which enumerates the Lyndon word factorization of any printable ASCII string, in order, outputting the resultant substrings as an array or stream of some kind. Characters should be compared by their code points, and all standard input and output methods are allowed. As usual for , the shortest program in bytes wins.

Test Cases

''           []
'C'          ['C']
'aaaaa'      ['a', 'a', 'a', 'a', 'a']
'K| '        ['K|', ' ']
'abaca'      ['abac', 'a']
'9_-$'       ['9_', '-', '$']
'P&O(;'      ['P', '&O(;']
'xhya{Wd$'   ['x', 'hy', 'a{', 'Wd', '$']
'j`M?LO!!Y'  ['j', '`', 'M', '?LO', '!!Y']
'!9!TZ'      ['!9!TZ']
'vMMe'       ['v', 'MMe']
'b5A9A9<5{0' ['b', '5A9A9<5{', '0']

user1502040

Posted 2017-06-19T20:45:06.433

Reputation: 2 196

Related – xnor – 2017-06-19T20:49:31.717

Note that this is equivalent to splitting by <=ness. (I have no idea how to express this better :| ) – CalculatorFeline – 2017-06-19T21:01:38.103

Is this equivalent to repeatedly taking the first character and the prefix of all characters bigger than it? – xnor – 2017-06-19T21:07:48.097

@xnor No. 'abac' is a Lyndon word. – user1502040 – 2017-06-19T21:13:14.107

@user1502040 I see, ties are interesting. I'd suggest adding some test cases that catch this. – xnor – 2017-06-19T21:14:20.137

@xnor Your proposed method can fail even when all characters are distinct. e.g. take the 'P&O(;' test case. – user1502040 – 2017-06-19T21:55:56.063

@user1502040 I think that one works. You first take the P, and the next character is smaller, so you just take P. Then, you take &, and the rest of the string is bigger than it, so you take all of &O(;. – xnor – 2017-06-19T23:08:54.380

@xnor Ah right, brain failure. I was thinking of monotonic substrings rather than just > the first character. – user1502040 – 2017-06-19T23:19:35.180

Answers

5

Pyth, 17 16 bytes

-1 byte thanks to isaacg!

hf!ff>Y>YZUYT+./

Try it online!

Explanation

hf!ff>Y>YZUYT+./
              ./    Take all possible disjoint substring sets of [the input]
             +      plus [the input] itself (for the null string case).
 f                  Filter for only those sets which
  !f        T       for none of the substrings
    f  >YZUY        is there a suffix of the substring
     >Y             lexographically smaller than the substring itself.
h                   Return the first (i.e. the shortest) such set of substrings.

notjagan

Posted 2017-06-19T20:45:06.433

Reputation: 4 011

1hf!ff>Y>YZUYT+./ accounts for the empty string case with 1 less byte. – isaacg – 2017-06-19T22:12:19.433

Nice, thanks! I felt like there must have been a shorter way. – notjagan – 2017-06-19T22:15:35.917

1

Jelly, 18 bytes

ṙJ¹ÐṂF⁼
ŒṖÇ€Ạ$ÐfṀȯ

Try it online!

Dennis

Posted 2017-06-19T20:45:06.433

Reputation: 196 637

0

Pyth - 28 bytes

hf&SI_T.Am.A>Rdt.u.>N1tddT./

Test Suite.

Maltysen

Posted 2017-06-19T20:45:06.433

Reputation: 25 023

1This seems to fail on the empty string. – user1502040 – 2017-06-19T21:34:15.380