14
3
This is a word game from a set of activity cards for children. Below the rules is code to find the best triplet using /usr/share/dict/words. I thought it was an interesting optimization problem, and am wondering if people can find improvements.
Rules
- Choose one letter from each of the sets below.
- Choose a word using the letters chosen (and any others).
- Score the word.
- Each letter from the chosen set gets the number shown with the set (repeats included).
AEIOU
count 0- All other letters are -2
- Repeat steps 1-3 above (no reusing letters in step 1) twice more.
- Final score is the sum of the three word scores.
Sets
(set 1 scores 1 point, set 2 scores 2 points, etc.)
- LTN
- RDS
- GBM
- CHP
- FWV
- YKJ
- QXZ
Code:
from itertools import permutations
import numpy as np
points = {'LTN' : 1,
'RDS' : 2,
'GBM' : 3,
'CHP' : 4,
'FWV' : 5,
'YKJ' : 6,
'QXZ' : 7}
def tonum(word):
word_array = np.zeros(26, dtype=np.int)
for l in word:
word_array[ord(l) - ord('A')] += 1
return word_array.reshape((26, 1))
def to_score_array(letters):
score_array = np.zeros(26, dtype=np.int) - 2
for v in 'AEIOU':
score_array[ord(v) - ord('A')] = 0
for idx, l in enumerate(letters):
score_array[ord(l) - ord('A')] = idx + 1
return np.matrix(score_array.reshape(1, 26))
def find_best_words():
wlist = [l.strip().upper() for l in open('/usr/share/dict/words') if l[0].lower() == l[0]]
wlist = [l for l in wlist if len(l) > 4]
orig = [l for l in wlist]
for rep in 'AEIOU':
wlist = [l.replace(rep, '') for l in wlist]
wlist = np.hstack([tonum(w) for w in wlist])
best = 0
ct = 0
bestwords = ()
for c1 in ['LTN']:
for c2 in permutations('RDS'):
for c3 in permutations('GBM'):
for c4 in permutations('CHP'):
for c5 in permutations('FWV'):
for c6 in permutations('YJK'):
for c7 in permutations('QZX'):
vals = [to_score_array(''.join(s)) for s in zip(c1, c2, c3, c4, c5, c6, c7)]
ct += 1
print ct, 6**6
scores1 = (vals[0] * wlist).A.flatten()
scores2 = (vals[1] * wlist).A.flatten()
scores3 = (vals[2] * wlist).A.flatten()
m1 = max(scores1)
m2 = max(scores2)
m3 = max(scores3)
if m1 + m2 + m3 > best:
print orig[scores1.argmax()], orig[scores2.argmax()], orig[scores3.argmax()], m1 + m2 + m3
best = m1 + m2 + m3
bestwords = (orig[scores1.argmax()], orig[scores2.argmax()], orig[scores3.argmax()])
return bestwords, best
if __name__ == '__main__':
import timeit
print timeit.timeit('print find_best_words()', 'from __main__ import find_best_words', number=1)
The matrix version is what I came up with after writing one in pure python (using dictionaries and scoring each word independently), and another in numpy but using indexing rather than matrix multiplication.
The next optimization would be to remove the vowels from the scoring entirely (and use a modified ord()
function), but I wonder if there are even faster approaches.
EDIT: added timeit.timeit code
EDIT: I'm adding a bounty, which I'll give to whichever improvement I most like (or possibly multiple answers, but I'll have to accrue some more reputation if that's the case).
3BTW, I wrote the code to give my eight-year-old three words to memorize for when he played the game against his mother. Now I know what xylopyrography means. – None – 2011-04-29T13:25:06.163
2This is a fun question. I think you could make it more likely to get responses if you provide the following: (1) A link to an online word list so everyone works with the same data set. (2) Put your solution in a single function. (3) Run that function using the time-it module to show timings. (4) Make sure to put the loading of dictionary data outside of the function so that we are not testing disk speed. People then can use the existing code as a framework for comparing their solutions. – None – 2011-04-29T14:17:27.930
I'll rewrite to use timeit, but for fair comparisons I'd have to use my own machine (which I'm happy to do for people posting solutions). A word list should be available on most systems, but if not, there are several here: http://wordlist.sourceforge.net/
– None – 2011-04-29T15:07:24.0971Fair comparisons can be had if each user times your solution and any other posted solutions against their own on their own machine. There will be some differences cross platform, but in general this method works. – None – 2011-04-29T17:57:15.073
Good point. I've added the timeit code. For testing, I suggest trimming the word list to 100 or so entries. – None – 2011-04-29T20:17:23.033
Why is this python specific? There seems to be no need for it. – Juan – 2011-04-29T21:57:14.507
@Juan: It came from SO; this wasn't originally intended to be a code challenge/puzzle. – Joey – 2011-04-30T07:07:43.157
@Joey: Exactly. @Juan: I'm interested in any new methods for solving the problem, but more in algorithmic improvements than particular implementations. – thouis – 2011-04-30T16:56:19.760
1Hm, in that case I wonder whether this is the correct site. I think SO would have been the best fit. – Joey – 2011-04-30T17:13:22.787
Feel free to flag it to be moved back... I'm not sure why it got moved in the first place. – thouis – 2011-04-30T18:46:19.803
thouis: Because some mod (one who I have never seen here, though) thought it would fit better here than on SO. I guess it was more an outside-this-site decision. At the very least it could have been discussed :) – Joey – 2011-05-01T14:23:44.363