Parse the Bookworm dictionary format

42

7

I've recently been indulging myself in some nostalgia in the form of Bookworm Deluxe:

In case you haven't seen it before, it's a word game where the goal is to connect adjacent tiles to form words. In order to determine whether a string is a valid word, it checks it against its internal dictionary, which is stored in a compressed format that looks like this:

aa
2h
3ed
ing
s
2l
3iis
s
2rdvark
8s
4wolf
7ves

The rules for unpacking the dictionary are simple:

  1. Read the number at the start of the line, and copy that many characters from the beginning of the previous word. (If there is no number, copy as many characters as you did last time.)

  2. Append the following letters to the word.

So, our first word is aa, followed by 2h, which means "copy the first two letters of aa and append h," forming aah. Then 3ed becomes aahed, and since the next line doesn't have a number, we copy 3 characters again to form aahing. This process continues throughout the rest of the dictionary. The resulting words from the small sample input are:

aa
aah
aahed
aahing
aahs
aal
aaliis
aals
aardvark
aardvarks
aardwolf
aardwolves

Your challenge is to perform this unpacking in as few bytes as possible.

Each line of input will contain zero or more digits 0-9 followed by one or more lowercase letters a-z. You may take input and give output as either a list of strings, or as a single string with words separated by any character other than 0-9/a-z.

Here is another small test case with a few edge cases not covered in the example:

abc cba 1de fg hi 0jkl mno abcdefghijk 10l
=> abc cba cde cfg chi jkl mno abcdefghijk abcdefghijl

You may also test your code on the full dictionary: input, output.

Doorknob

Posted 2018-12-19T19:37:05.283

Reputation: 68 138

Is there a possibility that there will not be a number in the second line? Also, can we assume that no number except 0 will have leading 0s? – Erik the Outgolfer – 2018-12-19T19:43:01.400

@EriktheOutgolfer Yes, that is possible; I've added that to the test case. And yes, you can assume that (as well as that the number won't be greater than the length of the previous word). – Doorknob – 2018-12-19T19:46:18.873

11That's a cute compression format :] – Poke – 2018-12-19T20:19:43.837

1The locate program uses this type of encoding on pathnames. – Dan D. – 2018-12-19T21:24:42.963

I wrote this program for my actual use, about 15 years ago. Unfortunately I don't think I have the source anymore... – hobbs – 2018-12-20T01:24:14.253

If I remembered correctly, I must find this file out from the game about 10 years ago, when I want to win the game more easily. But I did not understand how the compression works. I should read this post earlier. :( – tsh – 2018-12-20T09:30:26.230

Can we assume the leading number can only be 1 digit long at most? – ASCII-only – 2018-12-30T10:58:50.050

@ASCII-only There is a counterexample in the test cases. – Doorknob – 2018-12-30T15:07:41.583

@Doorknob oh :/ didn't see that – ASCII-only – 2018-12-31T01:31:17.337

Answers

13

Vim, 57 bytes

:%s/\a/ &
:%norm +hkyiwjP
:g/\d/norm diw-@"yl+P
:%s/ //g

Try it online!

James

Posted 2018-12-19T19:37:05.283

Reputation: 54 537

Would <H<G instead of the last substitution work? – user41805 – 2018-12-22T16:27:50.497

@cowsquack Unfortunately, no. Every input that doesn't start with a number increases the number of leading spaces so there's no way to guarantee a < solution would unindent enough times.

– James – 2018-12-22T19:26:45.587

I think you can do :%s/ * instead of the last substitution to save two bytes. – Dexter CD – 2019-01-02T12:24:20.513

10

JavaScript (ES6),  66 62  61 bytes

a=>a.map(p=s=>a=a.slice([,x,y]=/(\d*)(.*)/.exec(s),p=x||p)+y)

Try it online!

Commented

a =>                  // a[] = input, re-used to store the previous word
  a.map(p =           // initialize p to a non-numeric value
  s =>                // for each string s in a[]:
    a =               //   update a:
      a.slice(        //     extract the correct prefix from the previous word:
        [, x, y] =    //       load into x and y:
          /(\d*)(.*)/ //         the result of a regular expression which splits the new
          .exec(s),   //         entry into x = leading digits and y = trailing letters
                      //       this array is interpreted as 0 by slice()
        p = x || p    //       update p to x if x is not an empty string; otherwise leave
                      //       it unchanged; use this as the 2nd parameter of slice()
      )               //     end of slice()
      + y             //     append the new suffix
  )                   // end of map()

Arnauld

Posted 2018-12-19T19:37:05.283

Reputation: 111 334

5

Perl 6, 50 48 bytes

-2 bytes thanks to nwellnhof

{my$l;.map:{$!=S[\d*]=substr $!,0,$l [R||]=~$/}}

Try it online!

A port of Arnauld's solution. Man, that R|| trick was a rollercoaster from 'I think this could be possible', to 'nah, it's impossible', to 'kinda maybe possible' and finally 'aha!'

Explanation:

{my$l;.map:{$!=S[\d*]=substr $!,0,$l [R||]=~$/}}
{                                              }  # Anonymous code block
 my$l;    # Declare the variable $l, which is used for the previous number
      .map:{                                  }  # Map the input list to
            $!=              # $! is used to save the previous word
               S[\d*]=       # Substitute the number for
                      substr $!,0    # A substring of the previous word
                                 ,              # With the length of 
                                           ~$0     # The num if it exists
                                  $l [R||]=        # Otherwise the previous num

The $l [R||]=~$/ part roughly translates to $l= ~$/||+$l but... it has the same amount of bytes :(. Originally, it saved bytes using an anonymous variable so the my$l was gone but that doesn't work since the scope is now the substitution, not the map codeblock. Oh well. Anyways, R is the reverse metaoperator, so it reverses the arguments of ||, so the $l variable ends up being assigned the new number (~$/) if it exists, otherwise itself again.

It could be 47 bytes if Perl 6 didn't throw a kinda redundant compiler error for =~.

Jo King

Posted 2018-12-19T19:37:05.283

Reputation: 38 234

5

Ruby, 49 45 43 bytes

$0=$_=$0[/.{0#{p=$_[/\d+/]||p}}/]+$_[/\D+/]

Try it online!

Explanation

$0=                                         #Previous word, assign the value of
   $_=                                      #Current word, assign the value of
      $0[/.{0#{              }}/]           #Starting substring of $0 of length p which is
               p=$_[/\d+/]||p               #defined as a number in the start of $_ if any 
                                 +$_[/\D+/] #Plus any remaining non-digits in $_

Kirill L.

Posted 2018-12-19T19:37:05.283

Reputation: 6 693

5

C, 65 57 bytes

n;f(){char c[99];while(scanf("%d",&n),gets(c+n))puts(c);}

Try it online!

Explanation:

n;                     /* n is implicitly int, and initialized to zero. */

f() {                  /* the unpacking function. */

    char c[99];        /* we need a buffer to read into, for the longest line in
                          the full dictionary we need 12 + 1 bytes. */

    while(             /* loop while there is input left. */

        scanf("%d",&n) /* Read into n, if the read fails because this line
                          doesn't have a number n's value does not change.
                          scanf's return value is ignored. */

        ,              /* chain expressions with the comma operator. The loop
                          condition is on the right side of the comma. */

        gets(c+n))     /* we read into c starting from cₙ. c₀, c₁.. up to cₙ is
                          the shared prefix of the word we are reading and the
                          previous word. When gets is successful it returns c+n
                          else it will return NULL. When the loop condition is
                          NULL the loop exits. */

        puts(c);}      /* print the unpacked word. */

Dexter CD

Posted 2018-12-19T19:37:05.283

Reputation: 151

5

brainfuck, 201 bytes

,[[[-<+>>>+<<]>-[---<+>]<[[-<]>>]<[-]>>[<<,>>>[-[-<++++++++++>]]++++<[->+<]-[----->-<]<]<]>>>[[>>]+[-<<]>>[[>>]+[<<]>>-]]+[>>]<[-]<[<<]>[->[>>]<+<[<<]>]>[>.>]+[>[-]<,.[->+>+<<]>>----------]<[<<]>-<<<,]

Try it online!

Requires a trailing newline at the end of the input. A version without this requirement is 6 bytes longer:

brainfuck, 207 bytes

,[[[-<+>>>+<<]>-[---<+>]<[[-<]>>]<[-]>>[<<,>>>[-[-<++++++++++>]]++++<[->+<]-[----->-<]<]<]>>>[[>>]+[-<<]>>[[>>]+[<<]>>-]]+[>>]<[-]<[<<]>[->[>>]<+<[<<]>]>[>.>]+[>[-]<,[->+>+<<]>>[----------<.<]>>]<[<<]>-<<<,]

Try it online!

Both versions assume all numbers are strictly less than 255.

Explanation

The tape is laid out as follows:

tempinputcopy 85 0 inputcopy number 1 a 1 a 1 r 1 d 0 w 0 o 0 l 0 f 0 ...

The "number" cell is equal to 0 if no digits are input, and n+1 if the number n is input. Input is taken at the cell marked "85".

,[                     take input and start main loop
 [                     start number input loop
  [-<+>>>+<<]          copy input to tempinputcopy and inputcopy
  >-[---<+>]           put the number 85 in the cell where input was taken
  <[[-<]>>]            test whether input is less than 85; ending position depends on result of comparison
                       (note that digits are 48 through 57 while letters are 97 through 122)
  <[-]>                clean up by zeroing out the cell that didn't already become zero
  >[                   if input was a digit:
   <<,>>               get next input character
   >[-[-<++++++++++>]] multiply current value by 10 and add to current input
   ++++                set number cell to 4 (as part of subtracting 47)
   <[->+<]             add input plus 10*number back to number cell
   -[----->-<]         subtract 51
  <]                   move to cell we would be at if input were a letter
 <]                    move to input cell; this is occupied iff input was a digit

                       part 2: update/output word

 >>>                   move to number cell
 [                     if occupied (number was input):
  [>>]+[-<<]>>         remove existing marker 1s and decrement number cell to true value
  [[>>]+[<<]>>-]       create the correct amount of marker 1s
 ]
 +[>>]<[-]             zero out cell containing next letter from previous word
 <[<<]>                return to inputcopy
 [->[>>]<+<[<<]>]      move input copy to next letter cell
 >[>.>]                output word so far
 +[                    do until newline is read:
  >[-]<                zero out letter cell
  ,.                   input and output next letter or newline
  [->+>+<<]            copy to letter cell and following cell
  >>----------         subtract 10 to compare to newline
 ]
 <[<<]>-               zero out number cell (which was 1 to make copy loop shorter)
 <<<,                  return to input cell and take input
]                      repeat until end of input

Nitrodon

Posted 2018-12-19T19:37:05.283

Reputation: 9 181

4

Python 3.6+, 172 195 156 123 122 121 104 bytes

import re
def f(l,n=0,w=""):
 for s in l:t=re.match("\d*",s)[0];n=int(t or n);w=w[:n]+s[len(t):];yield w

Try it online!

Explanation

I caved, and used Regular Expressions. This saved at least 17 bytes. :

t=re.match("\d*",s)[0]

When the string doesn't begin with a digit at all, the length of this string will be 0. This means that:

n=int(t or n)

will be n if t is empty, and int(t) otherwise.

w=w[:n]+s[len(t):]

removes the number that the regular expression found from s (if there's no number found, it'll remove 0 characters, leaving s untruncated) and replaces all but the first n characters of the previous word with the current word fragment; and:

yield w

outputs the current word.

wizzwizz4

Posted 2018-12-19T19:37:05.283

Reputation: 1 895

4

Haskell, 82 81 bytes

tail.map concat.scanl p["",""]
p[n,l]a|[(i,r)]<-reads a=[take i$n++l,r]|1<2=[n,a]

Takes and returns a list of strings.

Try it online!

        scanl p["",""]        -- fold function 'p' into the input list starting with
                              -- a list of two empty strings and collect the
                              -- intermediate results in a list
  p [n,l] a                   -- 1st string of the list 'n' is the part taken form the last word
                              -- 2nd string of the list 'l' is the part from the current line
                              -- 'a' is the code from the next line
     |[(i,r)]<-reads a        -- if 'a' can be parsed as an integer 'i' and a string 'r'
       =[take i$n++l,r]       -- go on with the first 'i' chars from the last line (-> 'n' and 'l' concatenated) and the new ending 'r'
     |1<2                     -- if parsing is not possible
       =[n,a]                 -- go on with the previous beginning of the word 'n' and the new end 'a'
                              -- e.g. [         "aa",     "2h",      "3ed",       "ing"       ] 
                              -- ->   [["",""],["","aa"],["aa","h"],["aah","ed"],["aah","ing"]]
  map concat                  -- concatenate each sublist
tail                          -- drop first element. 'scanl' saves the initial value in the list of intermediate results. 

Edit: -1 byte thanks to @Nitrodon.

nimi

Posted 2018-12-19T19:37:05.283

Reputation: 34 639

1Contrary to usual Haskell golfing wisdom, you can actually save one byte here by not defining the helper function as an infix operator. – Nitrodon – 2018-12-23T04:46:34.770

@Nitrodon: well spotted! Thanks! – nimi – 2018-12-23T10:07:16.857

3

Japt, 19 18 17 bytes

Initially inspired by Arnauld's JS solution.

;£=¯V=XkB ªV +XoB

Try it

                      :Implicit input of string array U
 £                    :Map each X
   ¯                  :  Slice U to index
      Xk              :    Remove from X
;       B             :     The lowercase alphabet (leaving only the digits or an empty string, which is falsey)
          ªV          :    Logical OR with V (initially 0)
    V=                :    Assign the result to V for the next iteration
             +        :  Append
              Xo      :  Remove everything from X, except
;               B     :   The lowercase alphabet
  =                   :  Reassign the resulting string to U for the next iteration

Shaggy

Posted 2018-12-19T19:37:05.283

Reputation: 24 623

2

Jelly, 16 bytes

⁹fØDVo©®⁸ḣ;ḟØDµ\

Try it online!

How it works

⁹fØDVo©®⁸ḣ;ḟØDµ\  Main link. Argument: A (array of strings)

              µ\  Cumulatively reduce A by the link to the left.
⁹                     Yield the right argument.
  ØD                  Yield "0123456789".
 f                    Filter; keep only digits.
    V                 Eval the result. An empty string yields 0.
     o©               Perform logical OR and copy the result to the register.
       ®              Yield the value in the register (initially 0).
        ⁸ḣ            Head; keep that many character of the left argument.
          ;           Concatenate the result and the right argument.
            ØD        Yield "0123456789".
           ḟ          Filterfalse; keep only non-digits.

Dennis

Posted 2018-12-19T19:37:05.283

Reputation: 196 637

1

Python 2, 118 bytes

import re
n=0
l=input()
o=l.pop(0)
print o
for i in l:(N,x),=re.findall('(\d*)(.+)',i);n=int(N or n);o=o[:n]+x;print o

Try it online!

Erik the Outgolfer

Posted 2018-12-19T19:37:05.283

Reputation: 38 134

1

Retina 0.8.2, 69 bytes

+`((\d+).*¶)(\D)
$1$2$3
\d+
$*
+m`^((.)*(.).*¶(?<-2>.)*)(?(2)$)1
$1$3

Try it online! Link includes harder test cases. Explanation:

+`((\d+).*¶)(\D)
$1$2$3

For all lines that begin with letters, copy the number from the previous line, looping until all lines begin with a number.

\d+
$*

Convert the number to unary.

+m`^((.)*(.).*¶(?<-2>.)*)(?(2)$)1
$1$3

Use balancing groups to replace all 1s with the corresponding letter from the previous line. (This turns out to be slightly golfier than replacing all runs of 1s.)

Neil

Posted 2018-12-19T19:37:05.283

Reputation: 95 035

1

Red, 143 bytes

func[b][a: charset[#"a"-#"z"]u: b/1 n: 0 foreach c b[parse c[copy m to a
p: copy s to end(if p<> c[n: do m]print u: rejoin[copy/part u n s])]]]

Try it online!

Galen Ivanov

Posted 2018-12-19T19:37:05.283

Reputation: 13 815

1

Java (JDK), 150 bytes

a->{String p="",s[];for(int n=0,i=0;i<a.length;a[i]=p=p.substring(0,n=s.length<1?n:new Short(s[0]))+a[i++].replaceAll("\\d",""))s=a[i].split("\\D+");}

Try it online!

Olivier Grégoire

Posted 2018-12-19T19:37:05.283

Reputation: 10 647

1

Groovy, 74 bytes

{w="";d=0;it.replaceAll(/(\d*)(.+)/){d=(it[1]?:d)as int;w=w[0..<d]+it[2]}}

Try it online!

Explanation:

{                                                                        }  Closure, sole argument = it
 w="";d=0;                                                                  Initialize variables
          it.replaceAll(/(\d*)(.+)/){                                   }   Replace every line (since this matches every line) and implicitly return. Loop variable is again it
                                     d=(it[1]?:d)as int;                    If a number is matched, set d to the number as an integer, else keep the value
                                                        w=w[0..<d]+it[2]    Set w to the first d characters of w, plus the matched string

ASCII-only

Posted 2018-12-19T19:37:05.283

Reputation: 4 687

0

Jelly, 27 bytes

f€ȯ@\V,ɗḟ€ɗØDZẎḊṖḣ2/Ż;"f€Øa

Try it online!

Erik the Outgolfer

Posted 2018-12-19T19:37:05.283

Reputation: 38 134

0

Perl 5 -p, 45 41 bytes

s:\d*:substr($p,0,$l=$&+$l*/^\D/):e;$p=$_

Try it online!

Explanation:

s:\d*:substr($p,0,$l=$&+$l*/^\D/):e;$p=$_ Full program, implicit input
s:   :                           :e;      Replace
  \d*                                       Any number of digits
      substr($p,0,              )           By a prefix of $p (previous result or "")
                  $l=  +                      With a length (assigned to $l) of the sum
                     $&                         of the matched digits
                          *                     and the product
                        $l                        of $l (previous length or 0)
                           /^\D/                  and whether there is no number in the beginning (1 or 0)
                                                (product is $l if no number)
                                    $p=$_ Assign output to $p
                                          Implicit output

wastl

Posted 2018-12-19T19:37:05.283

Reputation: 3 089

0

Groovy, 103 99 bytes

{w=it[0];d=0;it.collect{m=it=~/(\d+)(.+)/;i=m.find()?{d=m[0][1] as int;m[0][2]}():it;w=w[0..<d]+i}}

Try it online!

GolfIsAGoodWalkSpoilt

Posted 2018-12-19T19:37:05.283

Reputation: 101

76? – ASCII-only – 2018-12-31T01:48:32.400

174? – ASCII-only – 2018-12-31T01:48:38.633

0

05AB1E, 20 19 17 bytes

õUvyþDõÊi£U}Xyá«=

Try it online or verify all test cases.

Explanation:

õ                  # Push an empty string ""
 U                 # Pop and store it in variable `X`
v                  # Loop `y` over the (implicit) input-list
 yþ                #  Push `y`, and leave only the digits (let's call it `n`)
   DõÊi  }         #  If it's NOT equal to an empty string "":
       £           #   Pop and push the first `n` characters of the string
        U          #   Pop and store it in variable `X`
          X        #  Push variable `X`
           yá      #  Push `y`, and leave only the letters
             «     #  Merge them together
              =    #  Print it (without popping)

Kevin Cruijssen

Posted 2018-12-19T19:37:05.283

Reputation: 67 575

0

Common Lisp, 181 bytes

(do(w(p 0))((not(setf g(read-line t()))))(multiple-value-bind(a b)(parse-integer g :junk-allowed t)(setf p(or a p)w(concatenate'string(subseq w 0 p)(subseq g b)))(format t"~a~%"w)))

Try it online!

Ungolfed:

(do (w (p 0))   ; w previous word, p previous integer prefix (initialized to 0)
    ((not (setf g (read-line t ()))))   ; read a line into new variable g
                                        ; and if null terminate: 
  (multiple-value-bind (a b)            ; let a, b the current integer prefix
      (parse-integer g :junk-allowed t) ; and the position after the prefix
    (setf p (or a p)                    ; set p to a (if nil (no numeric prefix) to 0)
          w (concatenate 'string        ; set w to the concatenation of prefix
             (subseq w 0 p)             ; characters from the previous word 
             (subseq g b)))             ; and the rest of the current line
    (format t"~a~%"w)))                 ; print the current word

As usual, the long identifers of Common Lisp make it non particularly suitable for PPCG.

Renzo

Posted 2018-12-19T19:37:05.283

Reputation: 2 260

0

Python 2, 101 100 99 bytes

import re
s=n='0'
for t in input():(m,w),=re.findall('(\d*)(.+)',t);n=m or n;s=s[:int(n)]+w;print s

Try it online!

Chas Brown

Posted 2018-12-19T19:37:05.283

Reputation: 8 959

0

C# (Visual C# Interactive Compiler), 134 bytes

a=>{int l=0,m,n;var p="";return a.Select(s=>{for(m=n=0;s[m]<58;n=n*10+s[m++]-48);return p=p.Substring(0,l=m>0?n:l)+s.Substring(m);});}

Try it online!

-9 bytes thanks to @ASCIIOnly!

Less golfed...

// a is an input list of strings
a=>{
  // l: last prefix length
  // m: current number of digits
  // n: current prefix length
  int l=0,m,n;
  // previous word
  var p="";
  // run a LINQ select against the input
  // s is the current word
  return a.Select(s=>{
    // nibble digits from start of the
    // current word to build up the
    // current prefix length
    for(m=n=0;
      s[m]<58;
      n=n*10+s[m++]-48);
    // append the prefix from the
    // previous word to the current
    // word and capture values
    // for the next iteration
    return
      p=p.Substring(0,l=m>0?n:l)+
      s.Substring(m);
  });
}

dana

Posted 2018-12-19T19:37:05.283

Reputation: 2 541

134? – ASCII-only – 2018-12-30T10:55:32.420

That's pretty cool :) I changed l=n>0?n:l to l=m>0?n:l because it wasn't picking up the case when a line started with zero (0jkl). Thanks for the tip! – dana – 2018-12-30T11:11:11.470

0

Scala, 226 129 102 bytes

Thanks @ASCII-only for their work here (and for the Groovy answer).

s=>{var(w,d,r)=("",0,"(\\d*)(.+)".r)
s map(_ match{case r(a,b)=>{if(a>"")d=a.toInt
w=w.take(d)+b}
w})}

Try it online!

V. Courtois

Posted 2018-12-19T19:37:05.283

Reputation: 868

:| both links are the same – ASCII-only – 2018-12-31T06:17:48.897

yeah, editing. I didn't know how to turn it out and was in a hurry so I did not modify what I did. – V. Courtois – 2018-12-31T06:21:31.380

130 – ASCII-only – 2018-12-31T08:27:27.707

129 – ASCII-only – 2018-12-31T08:35:47.883

127 – ASCII-only – 2018-12-31T23:34:23.483

Stop working at new year oO – V. Courtois – 2018-12-31T23:36:53.727

jokes aside, nicely done – V. Courtois – 2018-12-31T23:37:09.010

Let us continue this discussion in chat.

– ASCII-only – 2018-12-31T23:39:37.907