Generate the shortest De Bruijn

22

3

A De Bruijn sequence is interesting: It is the shortest, cyclic sequence that contains all possible sequences of a given alphabet of a given length. For example, if we were considering the alphabet A,B,C and a length of 3, a possible output is:

AAABBBCCCABCACCBBAACBCBABAC

You will notice that every possible 3-character sequence using the letters A, B, and C are in there.

Your challenge is to generate a De Bruijn sequence in as few characters as possible. Your function should take two parameters, an integer representing the length of the sequences, and a list containing the alphabet. Your output should be the sequence in list form.

You may assume that every item in the alphabet is distinct.

An example generator can be found here

Standard loopholes apply

Nathan Merrill

Posted 2014-12-23T21:06:22.837

Reputation: 13 591

Can the integer representing the length of the sequences be larger than the number of unique letters? – kukac67 – 2014-12-23T21:17:50.753

Yes. A binary sequence of length 4 would be 0000111101100101 – Nathan Merrill – 2014-12-23T21:18:54.290

"Your challenge is to generate a De Bruijn sequence in as few characters as possible" - Does this mean "code golf" or "get the shortest possible De Bruijn sequence length"? – FryAmTheEggman – 2014-12-23T21:46:43.767

2Both. To qualify, your program must output the shortest sequence possible, but to win, your program must be the shortest. – Nathan Merrill – 2014-12-23T21:59:35.820

"A binary sequence of length 4 would be 0000111101100101" => 0100 1000 and 1010 are missing here... unless sequences can be read backwards? It's not told in the rules :) – xem – 2014-12-23T22:11:23.470

I know it's not allowed, but I would just like to point out that Mathematica has a built-in for this: DeBruijnSequence[#,#2]& (albeit not in the core language but the Combinatorica package). – kukac67 – 2014-12-23T23:19:03.157

I'm sure this has been posted before, but maybe it was closed as a duplicate of this very similar question and deleted.

– Peter Taylor – 2014-12-24T09:18:11.617

2@xem: usually De Bruijn sequences include wraparound, which is where those missing sequences appear. – Keith Randall – 2014-12-25T04:14:43.177

Answers

6

Pyth, 31 bytes

This is the direct conversion of the algorithm used in my CJam answer. Tips for golfing welcome!

Mu?G}H+GG+G>Hefq<HT>G-lGTUH^GHk

This code defines a function g which takes two arguments, the string of list of characters and the number.

Example usage:

Mu?G}H+GG+G>Hefq<HT>G-lGTUH^GHkg"ABC"3

Output:

AAABAACABBABCACBACCBBBCBCCC

Code expansion:

M                 # def g(G,H):
 u                #   return reduce(lambda G, H:
  ?G              #     (G if
    }H            #       (H in
      +GG         #          add(G,G)) else
    +G            #       add(G,
      >H          #         slice_end(H,
        e         #           last_element(
         f        #             Pfilter(lambda T:
          q       #               equal(
           <HT    #                 slice_start(H,T),
           >G     #                 slice_end(G,
             -lGT #                   minus(Plen(G),T))),
          UH      #               urange(H)))))),
  ^GH             #     cartesian_product(G,H),
  k               #     "")

Try it here

Optimizer

Posted 2014-12-23T21:06:22.837

Reputation: 25 836

4

CJam, 52 49 48 bytes

This is surprisingly long. This can be golfed a lot, taking in tips from the Pyth translation.

q~a*{m*:s}*{:H\:G_+\#)GGHH,,{_H<G,@-G>=},W=>+?}*

The input goes like

3 "ABC"

i.e. - String of list of characters and the length.

and output is the De Bruijn string

AAABAACABBABCACBACCBBBCBCCC

Try it online here

Optimizer

Posted 2014-12-23T21:06:22.837

Reputation: 25 836

1Gosh CJam should be banned, it is not just made for one golfing task but it seems for every possible golfing task... – flawr – 2014-12-23T21:42:38.223

2@flawr you should wait for a Pyth answer then :P – Optimizer – 2014-12-23T21:44:51.397

3

CJam, 52 49 bytes

Here is a different approach in CJam:

l~:N;:L,(:Ma{_N*N<0{;)_!}g(+_0a=!}g]{,N\%!},:~Lf=

Takes input like this:

"ABC" 3

and produces a Lyndon work like

CCCBCCACBBCBACABCAABBBABAAA

Try it here.

This makes use of the relation with Lyndon words. It generates all Lyndon words of length n in lexicographic order (as outlined in that Wikipedia article), then drops those whose length doesn't divide n. This already yields the De Bruijn sequence, but since I'm generating the Lyndon words as strings of digits, I also need to replace those with the corresponding letters at the end.

For golfing reasons, I consider the later letters in the alphabet to have lower lexicographic order.

Martin Ender

Posted 2014-12-23T21:06:22.837

Reputation: 184 808

1

Python 2, 114 bytes

I'm not really sure how to golf it more, due to my approach.

def f(a,n):
 s=a[-1]*n
 while 1:
    for c in a:
     if((s+c)[len(s+c)-n:]in s)<1:s+=c;break
    else:break
 print s[:1-n]

Try it online

Ungolfed:

This code is a trivial modification from my solution to more recent challenge.

def f(a,n):
    s=a[-1]*n
    while 1:
        for c in a:
            p=s+c
            if p[len(p)-n:]in s:
                continue
            else:
                s=p
                break
        else:
            break
    print s[:1-n]

The only reason [:1-n] is needed is because the sequence includes the wrap-around.

mbomb007

Posted 2014-12-23T21:06:22.837

Reputation: 21 944

1

Powershell, 164 96 bytes

-68 bytes with -match O($n*2^n) instead recursive generator O(n*log(n))

param($s,$n)for(;$z=$s|% t*y|?{"$($s[-1])"*($n-1)+$x-notmatch-join"$x$_"[-$n..-1]}){$x+=$z[0]}$x

Ungolfed & test script:

$f = {

param($s,$n)                    # $s is a alphabet, $n is a subsequence length
for(;                           # repeat until...
    $z=$s|% t*y|?{              # at least a character from the alphabet returns $true for expression:
        "$($s[-1])"*($n-1)+$x-notmatch  # the old sequence that follows two characters (the last letter from the alphabet) not contains
        -join"$x$_"[-$n..-1]    # n last characters from the new sequence
}){
    $x+=$z[0]                   # replace old sequence with new sequence
}
$x                              # return the sequence

}

@(
    ,("ABC",  2, "AABACBBCC")
    ,("ABC",  3, "AAABAACABBABCACBACCBBBCBCCC")
    ,("ABC",  4, "AAAABAAACAABBAABCAACBAACCABABACABBBABBCABCBABCCACACBBACBCACCBACCCBBBBCBBCCBCBCCCC")
    ,("ABC",  5, "AAAAABAAAACAAABBAAABCAAACBAAACCAABABAABACAABBBAABBCAABCBAABCCAACABAACACAACBBAACBCAACCBAACCCABABBABABCABACBABACCABBACABBBBABBBCABBCBABBCCABCACABCBBABCBCABCCBABCCCACACBACACCACBBBACBBCACBCBACBCCACCBBACCBCACCCBACCCCBBBBBCBBBCCBBCBCBBCCCBCBCCBCCCCC")
    ,("ABC",  6, "AAAAAABAAAAACAAAABBAAAABCAAAACBAAAACCAAABABAAABACAAABBBAAABBCAAABCBAAABCCAAACABAAACACAAACBBAAACBCAAACCBAAACCCAABAABAACAABABBAABABCAABACBAABACCAABBABAABBACAABBBBAABBBCAABBCBAABBCCAABCABAABCACAABCBBAABCBCAABCCBAABCCCAACAACABBAACABCAACACBAACACCAACBABAACBACAACBBBAACBBCAACBCBAACBCCAACCABAACCACAACCBBAACCBCAACCCBAACCCCABABABACABABBBABABBCABABCBABABCCABACACABACBBABACBCABACCBABACCCABBABBABCABBACBABBACCABBBACABBBBBABBBBCABBBCBABBBCCABBCACABBCBBABBCBCABBCCBABBCCCABCABCACBABCACCABCBACABCBBBABCBBCABCBCBABCBCCABCCACABCCBBABCCBCABCCCBABCCCCACACACBBACACBCACACCBACACCCACBACBACCACBBBBACBBBCACBBCBACBBCCACBCBBACBCBCACBCCBACBCCCACCACCBBBACCBBCACCBCBACCBCCACCCBBACCCBCACCCCBACCCCCBBBBBBCBBBBCCBBBCBCBBBCCCBBCBBCBCCBBCCBCBBCCCCBCBCBCCCBCCBCCCCCC")
    ,("01",   3, "00010111")
    ,("01",   4, "0000100110101111")
    ,("abcd", 2, "aabacadbbcbdccdd")
    ,("0123456789", 3, "0001002003004005006007008009011012013014015016017018019021022023024025026027028029031032033034035036037038039041042043044045046047048049051052053054055056057058059061062063064065066067068069071072073074075076077078079081082083084085086087088089091092093094095096097098099111211311411511611711811912212312412512612712812913213313413513613713813914214314414514614714814915215315415515615715815916216316416516616716816917217317417517617717817918218318418518618718818919219319419519619719819922232242252262272282292332342352362372382392432442452462472482492532542552562572582592632642652662672682692732742752762772782792832842852862872882892932942952962972982993334335336337338339344345346347348349354355356357358359364365366367368369374375376377378379384385386387388389394395396397398399444544644744844945545645745845946546646746846947547647747847948548648748848949549649749849955565575585595665675685695765775785795865875885895965975985996667668669677678679687688689697698699777877978878979879988898999")
    ,("9876543210", 3, "9998997996995994993992991990988987986985984983982981980978977976975974973972971970968967966965964963962961960958957956955954953952951950948947946945944943942941940938937936935934933932931930928927926925924923922921920918917916915914913912911910908907906905904903902901900888788688588488388288188087787687587487387287187086786686586486386286186085785685585485385285185084784684584484384284184083783683583483383283183082782682582482382282182081781681581481381281181080780680580480380280180077767757747737727717707667657647637627617607567557547537527517507467457447437427417407367357347337327317307267257247237227217207167157147137127117107067057047037027017006665664663662661660655654653652651650645644643642641640635634633632631630625624623622621620615614613612611610605604603602601600555455355255155054454354254154053453353253153052452352252152051451351251151050450350250150044434424414404334324314304234224214204134124114104034024014003332331330322321320312311310302301300222122021121020120011101000")
) |% {
    $s,$n,$e = $_
    $r = &$f $s $n
    "$($r-eq$e): $r"
}

Output:

True: AABACBBCC
True: AAABAACABBABCACBACCBBBCBCCC
True: AAAABAAACAABBAABCAACBAACCABABACABBBABBCABCBABCCACACBBACBCACCBACCCBBBBCBBCCBCBCCCC
True: AAAAABAAAACAAABBAAABCAAACBAAACCAABABAABACAABBBAABBCAABCBAABCCAACABAACACAACBBAACBCAACCBAACCCABABBABABCABACBABACCABBACABBBBABBBCABBCBABBCCABCACABCBBABCBCABCCBABCCCACACBACACCACBBBACBBCACBCBACBCCACCBBACCBCACCCBACCCCBBBBBCBBBCCBBCBCBBCCCBCBCCBCCCCC
True: AAAAAABAAAAACAAAABBAAAABCAAAACBAAAACCAAABABAAABACAAABBBAAABBCAAABCBAAABCCAAACABAAACACAAACBBAAACBCAAACCBAAACCCAABAABAACAABABBAABABCAABACBAABACCAABBABAABBACAABBBBAABBBCAABBCBAABBCCAABCABAABCACAABCBBAABCBCAABCCBAABCCCAACAACABBAACABCAACACBAACACCAACBABAACBACAACBBBAACBBCAACBCBAACBCCAACCABAACCACAACCBBAACCBCAACCCBAACCCCABABABACABABBBABABBCABABCBABABCCABACACABACBBABACBCABACCBABACCCABBABBABCABBACBABBACCABBBACABBBBBABBBBCABBBCBABBBCCABBCACABBCBBABBCBCABBCCBABBCCCABCABCACBABCACCABCBACABCBBBABCBBCABCBCBABCBCCABCCACABCCBBABCCBCABCCCBABCCCCACACACBBACACBCACACCBACACCCACBACBACCACBBBBACBBBCACBBCBACBBCCACBCBBACBCBCACBCCBACBCCCACCACCBBBACCBBCACCBCBACCBCCACCCBBACCCBCACCCCBACCCCCBBBBBBCBBBBCCBBBCBCBBBCCCBBCBBCBCCBBCCBCBBCCCCBCBCBCCCBCCBCCCCCC
True: 00010111
True: 0000100110101111
True: aabacadbbcbdccdd
True: 0001002003004005006007008009011012013014015016017018019021022023024025026027028029031032033034035036037038039041042043044045046047048049051052053054055056057058059061062063064065066067068069071072073074075076077078079081082083084085086087088089091092093094095096097098099111211311411511611711811912212312412512612712812913213313413513613713813914214314414514614714814915215315415515615715815916216316416516616716816917217317417517617717817918218318418518618718818919219319419519619719819922232242252262272282292332342352362372382392432442452462472482492532542552562572582592632642652662672682692732742752762772782792832842852862872882892932942952962972982993334335336337338339344345346347348349354355356357358359364365366367368369374375376377378379384385386387388389394395396397398399444544644744844945545645745845946546646746846947547647747847948548648748848949549649749849955565575585595665675685695765775785795865875885895965975985996667668669677678679687688689697698699777877978878979879988898999
True: 9998997996995994993992991990988987986985984983982981980978977976975974973972971970968967966965964963962961960958957956955954953952951950948947946945944943942941940938937936935934933932931930928927926925924923922921920918917916915914913912911910908907906905904903902901900888788688588488388288188087787687587487387287187086786686586486386286186085785685585485385285185084784684584484384284184083783683583483383283183082782682582482382282182081781681581481381281181080780680580480380280180077767757747737727717707667657647637627617607567557547537527517507467457447437427417407367357347337327317307267257247237227217207167157147137127117107067057047037027017006665664663662661660655654653652651650645644643642641640635634633632631630625624623622621620615614613612611610605604603602601600555455355255155054454354254154053453353253153052452352252152051451351251151050450350250150044434424414404334324314304234224214204134124114104034024014003332331330322321320312311310302301300222122021121020120011101000

See also: One Ring to rule them all. One String to contain them all

mazzy

Posted 2014-12-23T21:06:22.837

Reputation: 4 832

1

JavaScript (ES6) 143

Using Lyndon words, like Martin's aswer, just 3 times long...

F=(a,n)=>{
  for(w=[-a[l='length']],r='';w[0];)
  {
    n%w[l]||w.map(x=>r+=a[~x]);
    for(;w.push(...w)<=n;);
    for(w[l]=n;!~(z=w.pop()););
    w.push(z+1)
  }
  return r
}

Test In FireFox/FireBug console

console.log(F("ABC",3),F("10",4))

Output

CCCBCCACBBCBACABCAABBBABAAA 0000100110101111

edc65

Posted 2014-12-23T21:06:22.837

Reputation: 31 086