I've recently been indulging myself in some nostalgia in the form of Bookworm Deluxe:

In case you haven't seen it before, it's a word game where the goal is to connect adjacent tiles to form words. In order to determine whether a string is a valid word, it checks it against its internal dictionary, which is stored in a compressed format that looks like this:

aa
2h
3ed
ing
s
2l
3iis
s
2rdvark
8s
4wolf
7ves

The rules for unpacking the dictionary are simple:

Read the number at the start of the line, and copy that many characters from the beginning of the previous word. (If there is no number, copy as many characters as you did last time.)
Append the following letters to the word.

So, our first word is aa, followed by 2h, which means "copy the first two letters of aa and append h," forming aah. Then 3ed becomes aahed, and since the next line doesn't have a number, we copy 3 characters again to form aahing. This process continues throughout the rest of the dictionary. The resulting words from the small sample input are:

aa
aah
aahed
aahing
aahs
aal
aaliis
aals
aardvark
aardvarks
aardwolf
aardwolves

Your challenge is to perform this unpacking in as few bytes as possible.

Each line of input will contain zero or more digits 0-9 followed by one or more lowercase letters a-z. You may take input and give output as either a list of strings, or as a single string with words separated by any character other than 0-9/a-z.

Here is another small test case with a few edge cases not covered in the example:

abc cba 1de fg hi 0jkl mno abcdefghijk 10l
=> abc cba cde cfg chi jkl mno abcdefghijk abcdefghijl

You may also test your code on the full dictionary: input, output.

Doorknob

Posted 2018-12-19T19:37:05.283

Reputation: 68 138

Is there a possibility that there will not be a number in the second line? Also, can we assume that no number except 0 will have leading 0s? – Erik the Outgolfer – 2018-12-19T19:43:01.400

@EriktheOutgolfer Yes, that is possible; I've added that to the test case. And yes, you can assume that (as well as that the number won't be greater than the length of the previous word). – Doorknob – 2018-12-19T19:46:18.873

11That's a cute compression format :] – Poke – 2018-12-19T20:19:43.837

1The locate program uses this type of encoding on pathnames. – Dan D. – 2018-12-19T21:24:42.963

I wrote this program for my actual use, about 15 years ago. Unfortunately I don't think I have the source anymore... – hobbs – 2018-12-20T01:24:14.253

If I remembered correctly, I must find this file out from the game about 10 years ago, when I want to win the game more easily. But I did not understand how the compression works. I should read this post earlier. :( – tsh – 2018-12-20T09:30:26.230

Can we assume the leading number can only be 1 digit long at most? – ASCII-only – 2018-12-30T10:58:50.050

@ASCII-only There is a counterexample in the test cases. – Doorknob – 2018-12-30T15:07:41.583

@Doorknob oh :/ didn't see that – ASCII-only – 2018-12-31T01:31:17.337

Answers

Vim, 57 bytes

:%s/\a/ &
:%norm +hkyiwjP
:g/\d/norm diw-@"yl+P
:%s/ //g

Parse the Bookworm dictionary format

Answers

Vim, 57 bytes

JavaScript (ES6), 66 62 61 bytes

Commented

Perl 6, 50 48 bytes

Explanation:

Ruby, 49 45 43 bytes

Explanation

C, 65 57 bytes

Explanation:

brainfuck, 201 bytes

brainfuck, 207 bytes

Explanation

Python 3.6+, 172 195 156 123 122 121 104 bytes

Explanation

Haskell, 82 81 bytes

Japt, 19 18 17 bytes

Jelly, 16 bytes

How it works

Python 2, 118 bytes

Retina 0.8.2, 69 bytes

Red, 143 bytes

Java (JDK), 150 bytes

Groovy, 74 bytes

Explanation:

Jelly, 27 bytes

Perl 5 -p, 45 41 bytes

Groovy, 103 99 bytes

05AB1E, 20 19 17 bytes

Common Lisp, 181 bytes

Python 2, 101 100 99 bytes

C# (Visual C# Interactive Compiler), 134 bytes

Scala, 226 129 102 bytes

Perl 5 `-p`, 45 41 bytes