Grow words from fertile vocabularies

6

An incremental word chain is a sequence of words of a vocabulary such that each word is the result of either prepending or appending a single character to the previous word, ignoring capitalization. An example of such a chain can be found below.

I → is → his → hiss

Your task is to write a program that for a given vocabulary finds the longest incremental word chain. The input is provided from standard input a vocabulary as one word per line. The words are composed of the lowercase letters a to z only. The output of your program shall go to standard output and shall be one word per line a longest word chain of the input vocabulary, starting with the shortest word.

This is code golf, the solution with the least character count wins. Different forms of input or output are not possible, but the terms standard input and standard output may be interpreted in a reasonable way for the language you are solving this problem in.

Please demonstrate that your code works by publishing the longest word chain you find on the xz-compressed vocabulary found here. This vocabulary was generated by executing the following series of commands on the file british-english-insane from the SCOWL project in the C locale:

tr 'A-Z' 'a-z' <british-english-insane | sed -n '/^[a-z]*$/p' | sort -u | xz -9e >voc.xz

A solution in polynomial time is possible and highly appreciated.

FUZxxl

Posted 2015-01-30T02:49:50.707

Reputation: 9 656

@MartinBüttner I agree with you that these challenges are similar, but an important difference is that in my challenge the graph is acyclic, so a solution in polynomial time can be found. – FUZxxl – 2015-01-30T10:04:18.680

Title and description don't match, as the title allows only appending whereas the description also allows prepending. Description trumps title, doesn't it? – nimi – 2015-01-30T10:15:49.820

@nimi You are correct. To put it simply, I didn't find a formulation of the problem that fits in the title and is correct. Maybe you can help me find one? – FUZxxl – 2015-01-30T10:50:19.387

FUZxxi, How about "Growing Words"? – DavidC – 2015-01-30T15:27:34.333

@DavidCarraher Better this way? – FUZxxl – 2015-01-30T17:23:18.947

Yes, I think it is better. Calls attention nicely. – DavidC – 2015-01-30T18:27:02.090

Is the input supposed to be the actual words, the name of a file containing the words, or something else? – KSFT – 2015-01-31T05:05:07.733

@KSFT The input is a newline separated list of words. – FUZxxl – 2015-02-01T22:39:03.353

Answers

1

Ruby 139 127

h={}
$<.map{|l|h[l.strip]=1}
v=->w,c=[]{w<?a?c:h[w]&&(v[w.chop,z=[w]+c]||v[w[1..-1],z])}
puts h.map{|k,_|v[k]||0}.max_by &:size

Runs in ~3.5 seconds and prints:

r
ra
rah
rahm
brahm
brahma
brahman
brahmani
brahmanis
brahmanism
brahmanisms

Cristian Lupascu

Posted 2015-01-30T02:49:50.707

Reputation: 8 369

@britishtea fixed! – Cristian Lupascu – 2015-01-31T08:25:49.320

2

Ruby, 127

Runs in about 6.5 seconds. Uses a couple of cool tricks. ARGF ($< or stdin, really) is enumerable! If an array is passed as a block argument, it can be "exploded" and assigned to a parameter using {|w,| ... }. Notice the comma.

I found an even shorter solution which uses Array#& (set intersection) to do the lookup of words, but that runs extremely slow, unfortunately.

Thanks to w0lf for shaving off a few characters

D={}
$<.map{|l|D[l.strip!]=l}
puts D.map{|w,|c=[w]
w.chars.all?{w=c[0]
r=D[w.chop]||D[w[1..-1]]
r ?c=[r]+c :p}
c}.max_by &:size

Output

r
ra
rah
rahm
brahm
brahma
brahman
brahmani
brahmanis
brahmanism
brahmanisms

Readable version

D = {}
$<.lines { |l| D[l.strip] = l.strip }

chains = D.map do |word, _|
  chain = [word]

  word.chars.all? do
    word   = c.first
    result = D[w.chop] || D[w[1..-1]]

    result ? chain.unshift(result) : nil
  end
end

puts chains.max_by(&:size)

britishtea

Posted 2015-01-30T02:49:50.707

Reputation: 1 189

Nice! You can save some more chars: max_by(&:size) --> max_by &:size and r ?c.unshift(r):p} --> r ?c=[r]+c :p} – Cristian Lupascu – 2015-01-31T08:38:53.470

1

Python 2, 208 bytes

Here is a not very small solution. It firsts creates an index of the word parts, then recursively searches for the longest chain.

import sys
d={}
b=lambda k:d.setdefault(k,[]).append(e)
for e in sys.stdin:e=e[:-1];b(e[1:]);b(e[:-1])
def f(a):
 g=a
 for c in d.get(a[-1],[]):g=max(g,f(a+[c]),key=len)
 return g
for e in f([''])[1:]:print e

It returns:

a
ab
rab
arab
arabi
arabin
arabine
carabine
carabiner
carabinero
carabineros

in 6.2 seconds.

The code before minifier looks like this:

import sys

lookup = {}
addindex = lambda k: lookup.setdefault(k, []).append(word)

for word in sys.stdin :
    word = word[:-1];
    addindex(word[1:]);
    addindex(word[:-1])
print 'Indexed' # DEBUG

def find(wordlist):
    best = wordlist
    for newword in lookup.get(wordlist[-1], []) :
        best = max(best, find(wordlist+[newword]), key=len)
    return best

for word in find([''])[1:] :
    print word

Logic Knight

Posted 2015-01-30T02:49:50.707

Reputation: 6 622

Please read from standard input and write the entire chain to standard output. – FUZxxl – 2015-01-30T10:05:03.743

Now uses stdin. I found a way to save a few bytes too. – Logic Knight – 2015-01-30T12:01:31.767

Your program is still incorrect. It has to print the entire chain, not just the last word in it. Also, it has to print the word chain in the format specified in the question. – FUZxxl – 2015-01-30T12:22:39.357

I take it that I am now presenting in the correct format? – Logic Knight – 2015-01-31T04:15:59.840

I hope the down-voter is more pleased with the improved answer :-) – Logic Knight – 2015-01-31T04:34:37.037

Yeah, the current format looks good. – FUZxxl – 2015-01-31T10:10:23.543

1

Haskell, 175 Bytes

import Data.Set
main=interact$unlines.snd.maximum.g.lines
g l=Prelude.map(f(0,[]))l where f z@(d,r)w|notMember w(fromList l)=z|1<2=max(f(d+1,w:r)(tail w))(f(d+1,w:r)(init w))

finds

r
ra
ran
fran
franc
franci
francis
francisc
francisca
franciscan
franciscans

Spent about 30 bytes for the use of a Set data structure for efficient lookups but that are useless (in code golf!) otherwise.

How it works: for every word from the input build recursively a list with valid child words. Take the longest list and print it. A "child word" is the word with either first or last letter missing. It is "valid" if it's in the input. If both child words exist, take the one that builds the longer sublist.

nimi

Posted 2015-01-30T02:49:50.707

Reputation: 34 639

"r" is not a word, so isn't this solution invalid? – David G. Stork – 2015-01-30T21:03:10.630

@DavidG.Stork It's part of the given dictionary. – agweber – 2015-01-30T21:10:17.070

@agweber: My goodness, "r" is now in the dictionary. I programmed in Mathematica an interesting, related problem (longest sequence of words where an insertion could occur anywhere) and, using the WordData all led to either "i" or "a". – David G. Stork – 2015-01-30T21:12:18.603

@DavidG.Stork Even if "r" is not a word the a → ra chain is still valid :) – kennytm – 2015-01-31T14:31:03.027

1

J: 246 bytes

's p'=:(*(>./i)>:])|:(6 s:[:s:}.;}:)S:0 w[i=:6 s:v=:s:w=:~.a:,<;._2 fread'voc'
l=:[:,([:(#S:0)5 s:_6 s:])"0
f=:3 :'(y{~])^:(~:0:)^:a:"0 y'
q=:_6 s:]
d=:[:+/@:*"1 f
ds=:d s[dp=:d p
n=:(ds<:dp)}s,:p[m=:ds>.dp
echo}:"1 q(r{i),.(r=:([:I.]=>./)m){f n

output:

`franciscans `franciscan `francisca `francisc `francis `franci `franc `fran `fra `fr `f

Version with comments.

tangentstorm

Posted 2015-01-30T02:49:50.707

Reputation: 111

Please obey the output format. – FUZxxl – 2015-01-30T21:17:32.817

Also, why is this community wiki? – Martin Ender – 2015-01-30T21:19:14.550

So someone else can edit it if they want. It's not really well golfed and I don't have time to work on it. – tangentstorm – 2015-01-30T21:31:36.380