Find The Wordiest Combination Lock

18

3

I have a combination padlock which has letters instead of numbers. It looks like this: http://pictures.picpedia.com/2012/09/Word_Combination_Padlock.jpg There are 5 reels, each of which has 10 different letters on it.

Most people like to use a word for their combination rather than an arbitrary string of letters. (Less secure, of course, but easier to remember.) So when manufacturing the lock, it would be good to build it to have a combination of letters which can be used to create as many 5-letter English words as possible.

Your task, should you choose to accept it, is to find an assignment of letters to reels which will allow as many words as possible to be created. For example, your solution might be

ABCDEFGHIJ DEFGHIJKLM ZYXWVUTSR ABCDEFGHIJ ABCDEFGHIJ

(If you weren't feeling too imaginative, that is).

For consistency, please use the word list at http://www.cs.duke.edu/~ola/ap/linuxwords

Any 5-letter word in that list is OK, including proper names. Ignore Sino- and L'vov and any other words in the list which contain a non a-z character.

The winning program is the one which produces the largest set of words. In the event that multiple programs find the same result, the first one to be posted wins. The program should run in under 5 minutes.

Edit: since activity has died down, and no better solutions have come out, I declare Peter Taylor the winner! Thanks everyone for your inventive solutions.

Edwin Young

Posted 2013-01-21T06:53:56.353

Reputation: 183

How can one count proper names considering that they vary so much across cultures? – elssar – 2013-01-21T08:20:10.123

@elssar, If I understand correctly, any word in the list is OK, regardless if it's a proper name (in any culture). – ugoren – 2013-01-21T09:28:29.780

Oh right, the in there, didn't see that – elssar – 2013-01-21T09:41:41.930

So, not a code question; but logic? – Brigand – 2013-01-21T13:05:15.533

2This is tagged as [tag:code-challenge]: what's the challenge? All you've asked for is the value which maximises a function whose domain's size is about 110.3 bits. So it's not feasible to brute-force the problem, but it should be feasible to get the exact answer, and maybe even to prove it correct. Bearing all that in mind, what are the prerequisites for an answer to be considered, and what criteria are you going to use to select a winner? – Peter Taylor – 2013-01-21T13:39:34.117

How do we handle the words with hyphens and apostrophes? – Brigand – 2013-01-21T14:08:41.550

Edited to try and clear up the questions above. – Edwin Young – 2013-01-21T15:18:48.397

+1 Lovely challenge. Anyone can produce an answer. But finding the optimal answer is far harder than it first seems. – DavidC – 2013-01-21T16:12:33.867

The current best I found was 1210 and since then just heats the room. Has anyone any higher number? @PeterTaylor: It is possible to reduce the number of bits a bit (remove combinations which for each position alone are not able to produce at least 1210 possible words). – Howard – 2013-01-21T19:06:08.383

@Howard, I looked into that when the best answer I'd seen was 1115 and it was pretty bad. My current best is 1275, and I'm about to post the code. – Peter Taylor – 2013-01-21T19:32:44.513

@Howard, in fact even with a cutoff of 1275, there are 109.7 bits of domain to brute force. Go up to a cutoff of 2000 and there are 99.5 bits, which is still impractical, and likely to exclude all actual results anyway. – Peter Taylor – 2013-01-21T19:47:58.577

Here's the word checker I'm using. Press input and replace the strings with your strings. (It'd be nice if you put this in the main question.)

– Brigand – 2013-01-21T20:37:06.197

@PeterTaylor About the bits thing - just forgot the ;-) after "reduce a bit". I would be happy to see if anybody can reduce calculation time to reasonable ranges. – Howard – 2013-01-21T22:44:07.047

Answers

6

1275 words by simple greedy hill-climbing

Code is C#. Solution produced is

Score 1275 from ^[bcdfgmpstw][aehiloprtu][aeilnorstu][acdeklnrst][adehklrsty]$

I'm using that output format because it's really easy to test:

grep -iE "^[bcdfgmpstw][aehiloprtu][aeilnorstu][acdeklnrst][adehklrsty]$" linuxwords.txt | wc

namespace Sandbox {
    class Launcher {
        public static void Main(string[] args)
        {
            string[] lines = _Read5s();
            int[][] asMasks = lines.Select(line => line.ToCharArray().Select(ch => 1 << (ch - 'a')).ToArray()).ToArray();
            Console.WriteLine(string.Format("{0} words found", lines.Length));

            // Don't even bother starting with a good mapping.
            int[] combos = _AllCombinations().ToArray();
            int[] best = new int[]{0x3ff, 0x3ff, 0x3ff, 0x3ff, 0x3ff};
            int bestSc = 0;
            while (true)
            {
                Console.WriteLine(string.Format("Score {0} from {1}", bestSc, _DialsToString(best)));

                int[] prevBest = best;
                int prevBestSc = bestSc;

                // Greedy hill-climbing approach
                for (int off = 0; off < 5; off++)
                {
                    int[] dials = (int[])prevBest.Clone();

                    dials[off] = (1 << 26) - 1;
                    int[][] filtered = asMasks.Where(mask => _Permitted(dials, mask)).ToArray();
                    int sc;
                    dials[off] = _TopTen(filtered, off, out sc);
                    if (sc > bestSc)
                    {
                        best = (int[])dials.Clone();
                        bestSc = sc;
                    }
                }

                if (bestSc == prevBestSc) break;
            }

            Console.WriteLine("Done");
            Console.ReadKey();
        }

        private static int _TopTen(int[][] masks, int off, out int sc)
        {
            IDictionary<int, int> scores = new Dictionary<int, int>();
            for (int k = 0; k < 26; k++) scores[1 << k] = 0;

            foreach (int[] mask in masks) scores[mask[off]]++;

            int rv = 0;
            sc = 0;
            foreach (KeyValuePair<int, int> kvp in scores.OrderByDescending(kvp => kvp.Value).Take(10))
            {
                rv |= kvp.Key;
                sc += kvp.Value;
            }
            return rv;
        }

        private static string _DialsToString(int[] dials)
        {
            StringBuilder sb = new StringBuilder("^");
            foreach (int dial in dials)
            {
                sb.Append('[');
                for (int i = 0; i < 26; i++)
                {
                    if ((dial & (1 << i)) != 0) sb.Append((char)('a' + i));
                }
                sb.Append(']');
            }
            sb.Append('$');
            return sb.ToString();
        }

        private static IEnumerable<int> _AllCombinations()
        {
            // \binom{26}{10}
            int set = (1 << 10) - 1;
            int limit = (1 << 26);
            while (set < limit)
            {
                yield return set;

                // Gosper's hack:
                int c = set & -set;
                int r = set + c;
                set = (((r ^ set) >> 2) / c) | r;
            }
        }

        private static bool _Permitted(int[] dials, int[] mask)
        {
            for (int i = 0; i < dials.Length; i++)
            {
                if ((dials[i] & mask[i]) == 0) return false;
            }
            return true;
        }

        private static string[] _Read5s()
        {
            System.Text.RegularExpressions.Regex word5 = new System.Text.RegularExpressions.Regex("^[a-z][a-z][a-z][a-z][a-z]$", System.Text.RegularExpressions.RegexOptions.Compiled);
            return File.ReadAllLines(@"d:\tmp\linuxwords.txt").Select(line => line.ToLowerInvariant()).Where(line => word5.IsMatch(line)).ToArray();
        }
    }
}

Peter Taylor

Posted 2013-01-21T06:53:56.353

Reputation: 41 901

I was just about to edit my answer with this exact solution, but you beat me to it. – cardboard_box – 2013-01-21T19:38:53.220

When I run the same hill-climbing search from 1000 random starting combinations and select the best of the 1000 local optima found, it seems to always produce the same solution, so it seems likely to be the global optimum. – Peter Taylor – 2013-01-22T13:01:51.543

That depends on your definition of likely ;-) But it is further "confirmed" by other approaches which yield 1275 as maximum. (And where did the Quantum-Tic-Tac-Toe go?) – Howard – 2013-01-22T13:30:26.547

@Howard, that was just an artifact of .Net not supporting multiple entry points in a single project. I have one "sandbox" project which I use for stuff like this, and I usually change the Main method to call different _Main methods. – Peter Taylor – 2013-01-22T13:42:46.453

I tried a genetic algorithm and got the same result in a few minutes, and then nothing in the next hour, so I wouldn't be surprised if it's the optimum. – cardboard_box – 2013-01-22T14:52:00.550

Several different approaches I have tried have all also settled on this same solution. I agree- I see no reason to think it isn't optimal. – Joe K – 2013-01-22T19:39:35.410

You should be able to use ^\w{5}$ or ^[a-z]{5}$ as your regex filter. – Shmiddty – 2013-01-23T17:58:20.457

Also grep -c as a sidenote. – ulidtko – 2013-01-27T22:35:13.560

4

Python (3), 1273 ≈ 30.5%

This is a really naïve approach: keep a tally of the frequency of each letter in each position, then eliminate the "worst" letter until the remaining letters will fit on the reels. I'm surprised it seems to do so well.

What's most interesting is that I have almost exactly the same output as the C# 1275 solution, except I have an N on my last reel instead of A. That A was my 11th-to-last elimination, too, even before throwing away a V and a G.

from collections import Counter

def main(fn, num_reels, letters_per_reel):
    # Read ye words
    words = []
    with open(fn) as f:
        for line in f:
            word = line.strip().upper()
            if len(word) == num_reels and word.isalpha():
                words.append(word)

    word_pool_size = len(words)

    # Populate a structure of freq[reel_number][letter] -> count
    freq = [Counter() for _ in range(num_reels)]
    for word in words:
        for r, letter in enumerate(word):
            freq[r][letter] += 1

    while True:
        worst_reelidx = None
        worst_letter = None
        worst_count = len(words)
        for r, reel in enumerate(freq):
            # Skip reels that already have too-few letters left
            if len(reel) <= letters_per_reel:
                continue

            for letter, count in reel.items():
                if count < worst_count:
                    worst_reelidx = r
                    worst_letter = letter
                    worst_count = count

        if worst_letter is None:
            # All the reels are done
            break

        # Discard any words containing this worst letter, and update counters
        # accordingly
        filtered_words = []
        for word in words:
            if word[worst_reelidx] == worst_letter:
                for r, letter in enumerate(word):
                    freq[r][letter] -= 1
                    if freq[r][letter] == 0:
                        del freq[r][letter]
            else:
                filtered_words.append(word)
        words = filtered_words

    for reel in freq:
        print(''.join(sorted(reel)))

    print("{} words found (~{:.1f}%)".format(
        len(words), len(words) / word_pool_size * 100))

Produces:

BCDFGMPSTW
AEHILOPRTU
AEILNORSTU
ACDEKLNRST
DEHKLNRSTY
1273 words found (~30.5%)

Eevee

Posted 2013-01-21T06:53:56.353

Reputation: 141

What does the percentage represent? – Joe Z. – 2013-01-22T17:38:38.680

percentage of the given words that can be made with the proposed set of reels – Eevee – 2013-01-22T17:54:17.407

Okay. (Whoa, I just saw who you were.) – Joe Z. – 2013-01-22T17:55:33.920

ha, small world. – Eevee – 2013-01-22T20:42:15.590

3

Mathematica, 1275 words again and again...

This code is not Golfed as the question does not appear to call for that.

wordlist = Flatten @ Import @ "http://www.cs.duke.edu/~ola/ap/linuxwords";
shortlist = Select[ToLowerCase@wordlist, StringMatchQ[#, Repeated[LetterCharacter, {5}]] &];
string = "" <> Riffle[shortlist, ","];

set = "a" ~CharacterRange~ "z";
gb = RandomChoice[set, {5, 10}];

best = 0;
While[True,
  pos = Sequence @@ RandomInteger /@ {{1, 5}, {1, 10}};
  old = gb[[pos]];
  gb[[pos]] = RandomChoice @ set;
  If[best < #,
    best = #; Print[#, "   ", StringJoin /@ gb],
    gb[[pos]] = old
  ] & @ StringCount[string, StringExpression @@ Alternatives @@@ gb]
]

The word count quickly (less than 10 seconds) evolves to 1275 on most runs but never gets beyond that. I tried perturbing the letters by more than one at a time in an attempt to get out of a theoretical local maximum but it never helped. I strongly suspect that 1275 is the limit for the given word list. Here is a complete run:

36   {tphcehmqkt,agvkqxtnpy,nkehuaakri,nsibxpctio,iafwdyhone}

37   {tpicehmqkt,agvkqxtnpy,nkehuaakri,nsibxpctio,iafwdyhone}

40   {tpicehmqkt,agvkqxtnpy,nkehuaakri,nsibxpctio,iafldyhone}

42   {tpicehmqkt,agvkqxtnpy,nkehuaakri,nsfbxpctio,iafldyhone}

45   {tpicehmrkt,agvkqxtnpy,nkehuaakri,nsfbxpctio,iafldyhone}

48   {tpicehmrkt,agvkwxtnpy,nkehuaakri,nsfbxpctio,iafldyhone}

79   {tpicehmskt,agvkwxtnpy,nkehuaakri,nsfbxpctio,iafldyhone}

86   {tpicehmskt,agvkwxtnpy,nkehuaakri,esfbxpctio,iafldyhone}

96   {tpicehmskt,agvkwxtnpy,nkehuaokri,esfbxpctio,iafldyhone}

97   {tpicehmskt,agvkwxtnpy,nkehuaokri,esfbxpctio,ipfldyhone}

98   {tpicehmskv,agvkwxtnpy,nkehuaokri,esfbxpctio,ipfldyhone}

99   {tpicehmskv,agvkwxtnpy,nkehuaokri,esfbzpctio,ipfldyhone}

101   {tpicehmskv,agvkwxtnpy,nkehuaokri,esfhzpctio,ipfldyhone}

102   {tpicehmskv,agvkwxtnpy,nkehuaokri,esfhzpctno,ipfldyhone}

105   {tpicehmskv,agvkwxtnpy,nkehuaokri,esfhzmctno,ipfldyhone}

107   {tpicehmskn,agvkwxtnpy,nkehuaokri,esfhzmctno,ipfldyhone}

109   {tpgcehmskn,agvkwxtnpy,nkehuaokri,esfhzmctno,ipfldyhone}

115   {tpgcehmsan,agvkwxtnpy,nkehuaokri,esfhzmctno,ipfldyhone}

130   {tpgcehmsan,agvkwxtnpy,nkehuaokri,esfhzmctno,ipfldyhons}

138   {tpgcehmsan,agvkwxtnpy,nkehuaokri,esfhzmctno,ipfldytons}

143   {tpgcehmsab,agvkwxtnpy,nkehuaokri,esfhzmctno,ipfldytons}

163   {tpgcehmsab,auvkwxtnpy,nkehuaokri,esfhzmctno,ipfldytons}

169   {tpgcehmsab,auvkwctnpy,nkehuaokri,esfhzmctno,ipfldytons}

176   {tpgcehmsab,auvkwctnpy,nkehuaokri,esfhzmctno,ihfldytons}

189   {tpgcehmsab,auvkwchnpy,nkehuaokri,esfhzmctno,ihfldytons}

216   {tpgcehmsab,auvkwchnpy,nkehtaokri,esfhzmctno,ihfldytons}

220   {tpgcehmsab,auvkwthnpy,nkehtaokri,esfhzmctno,ihfldytons}

223   {tpgcehmsab,auvkwthnpy,nkehtaokri,esfhbmctno,ihfldytons}

234   {tpgcehmsab,auvkwthnpy,nkegtaokri,esfhbmctno,ihfldytons}

283   {tpgcehmsab,auvkwthnpy,nkegtaokri,esfhbrctno,ihfldytons}

285   {tpdcehmsab,auvkwthnpy,nkegtaokri,esfhbrctno,ihfldytons}

313   {tpdcehmsab,auvkwthnly,nkegtaokri,esfhbrctno,ihfldytons}

371   {tpdcehmsab,auvkethnly,nkegtaokri,esfhbrctno,ihfldytons}

446   {tpdcehmsab,auvoethnly,nkegtaokri,esfhbrctno,ihfldytons}

451   {tpdcehmslb,auvoethnly,nkegtaokri,esfhbrctno,ihfldytons}

465   {tpdcwhmslb,auvoethnly,nkegtaokri,esfhbrctno,ihfldytons}

545   {tpdcwhmslb,auioethnly,nkegtaokri,esfhbrctno,ihfldytons}

565   {tpdcwhmslb,auioethnly,nkegtaocri,esfhbrctno,ihfldytons}

571   {tpdcwhmslb,auioethnly,nkegtaocri,esfhwrctno,ihfldytons}

654   {tpdcwhmslb,auioethnly,nkegtaocri,esfhwrctno,ihfedytons}

671   {tpdcwhmslb,auioethnly,nkegtaocri,esfhirctno,ihfedytons}

731   {tpdcwhmslb,auioethnly,nkegtaocri,esfhirctno,ihredytons}

746   {tpdcwhmslb,arioethnly,nkegtaocri,esfhirctno,ihredytons}

755   {tpdcwhmslb,arioethnuy,nkegtaocri,esfhirctno,ihredytons}

772   {tpdcwhmslb,arioethnuy,nkegtaocri,ekfhirctno,ihredytons}

786   {tpdcwhmslb,arioethnuy,nkegtaocri,ekfhirctno,lhredytons}

796   {tpdcwhmslb,arioethnuy,nkegtaocri,ekfhgrctno,lhredytons}

804   {tpdcwhmslb,arioethwuy,nkegtaocri,ekfhgrctno,lhredytons}

817   {tpdcwhmslb,arioethwuy,nklgtaocri,ekfhgrctno,lhredytons}

834   {tpdcwhmslb,arioethwuy,nklgtaocri,ekfhdrctno,lhredytons}

844   {tpdcwhmslb,arioethwup,nklgtaocri,ekfhdrctno,lhredytons}

887   {tpdcwhmslb,arioethwup,nklgtaocri,ekshdrctno,lhredytons}

901   {tpdcwhmslb,arioethwup,nklgtaouri,ekshdrctno,lhredytons}

966   {tpdcwhmslb,arioethwup,nklgtaouri,elshdrctno,lhredytons}

986   {tpdcwhmsfb,arioethwup,nklgtaouri,elshdrctno,lhredytons}

1015   {tpdcwhmsfb,arioethwup,nklgtaouri,elsidrctno,lhredytons}

1039   {tpdcwhmsfb,arioethwup,nklgtaouri,elsidrctno,khredytons}

1051   {tpdcwhmsfb,arioethwup,nklgtaouri,elskdrctno,khredytons}

1055   {tpdcwhmsfb,arioethwup,nklgtaouri,elskdrctno,khredytlns}

1115   {tpdcwhmsfb,arioethwup,nelgtaouri,elskdrctno,khredytlns}

1131   {tpdcwhmsfb,arioethwup,nelwtaouri,elskdrctno,khredytlns}

1149   {tpdcwhmsfb,arioethwup,nelwtaouri,elskdrctna,khredytlns}

1212   {tpdcwhmsfb,arioelhwup,nelwtaouri,elskdrctna,khredytlns}

1249   {tpdcwhmsfb,arioelhwup,nelstaouri,elskdrctna,khredytlns}

1251   {tpgcwhmsfb,arioelhwup,nelstaouri,elskdrctna,khredytlns}

1255   {tpgcwdmsfb,arioelhwup,nelstaouri,elskdrctna,khredytlns}

1258   {tpgcwdmsfb,arioelhwup,nelstaouri,elskdrctna,khredytlas}

1262   {tpgcwdmsfb,arioelhwut,nelstaouri,elskdrctna,khredytlas}

1275   {tpgcwdmsfb,arioelhput,nelstaouri,elskdrctna,khredytlas}

Here are some other "winning" selections:

{"cbpmsftgwd", "hriuoepatl", "euosrtanli", "clknsaredt", "yhlkdstare"}
{"wptdsgcbmf", "ohlutraeip", "erotauinls", "lknectdasr", "sytrhklaed"}
{"cftsbwgmpd", "ropilhtaue", "niauseltor", "clstnkdrea", "esdrakthly"}
{"smgbwtdcfp", "ihulpreota", "ianrsouetl", "ekndasctlr", "kehardytls"}

As Peter comments these are actually the same solution in different orders. Sorted:

{"bcdfgmpstw", "aehiloprtu", "aeilnorstu", "acdeklnrst", "adehklrsty"}

Mr.Wizard

Posted 2013-01-21T06:53:56.353

Reputation: 2 481

@belisarius Thanks! It's more interesting with ENABLE2k.

– Mr.Wizard – 2013-01-24T17:13:21.477

I've been considering Combinatorica's NetworkFlow for this one, but haven't found a useful way to use it – Dr. belisarius – 2013-01-24T17:19:06.427

@belisarius I hope you find a way; I'd like to see that. – Mr.Wizard – 2013-01-24T17:19:47.527

@belisarius by the way my code for shortlist feels long, and though this is not Golf I'd like something shorter. Can you help? – Mr.Wizard – 2013-01-24T17:21:13.750

1I think your "winning" selections are all the same modulo permutation within dials. – Peter Taylor – 2013-01-24T18:49:55.127

@Peter I suspect(ed) that myself but I was too lazy to prove it. How long does your C code take to converge on the 1275 solution? (Or how many per second if it's really fast.) – Mr.Wizard – 2013-01-24T22:56:27.237

It takes 12 times round the main loop to converge, and running under Mono on my Linux box does it 100 times in just over 5 seconds. – Peter Taylor – 2013-01-24T23:05:31.403

2

Python, 1210 words (~ 29%)

Assuming I counted the words correctly this time, this is slightly better than FakeRainBrigand's solution. The only difference is I add each reel in order, and then remove all words from the list that don't match the reel so I get a slightly better distribution for the next reels. Because of this, it gives the exact same first reel.

word_list = [line.upper()[:-1] for line in open('linuxwords.txt','r').readlines() if len(line) == 6]
cur_list = word_list
s = ['']*5
for i in range(5):
    count = [0]*26
    for j in range(26):
        c = chr(j+ord('A'))
        count[j] = len([x for x in cur_list if x[i] == c])
    s[i] = [chr(x+ord('A')) for x in sorted(range(26),lambda a,b: count[b] - count[a])[:10]]
    cur_list = filter(lambda x:x[i] in s[i],cur_list)
for e in s:
    print ''.join(e)
print len(cur_list)

The program outputs

SBCAPFDTMG
AOREILUHTP
ARIOLENUTS
ENTLRCSAID
SEYDTKHRNL
1210

cardboard_box

Posted 2013-01-21T06:53:56.353

Reputation: 5 150

Nice, and 1210 works in my checker. – Brigand – 2013-01-21T20:26:10.073

1

iPython (273 210 Bytes, 1115 words)

1115/4176* ~ 27%

I calculated these in iPython, but my history (trimmed to remove debugging) looked like this.

with open("linuxwords") as fin: d = fin.readlines()
x = [w.lower().strip() for w in d if len(w) == 6]
# Saving for later use:
# with open("5letter", "w") as fout: fout.write("\n".join(x))
from string import lowercase as low
low=lowercase + "'"
c = [{a:0 for a in low} for q in range(5)]
for w in x:
    for i, ch in enumerate(w):
        c[i][ch] += 1

[''.join(sorted(q, key=q.get, reverse=True)[:10]) for q in c]

If we're going for short; I could trim it to this.

x = [w.lower().strip() for w in open("l") if len(w)==6]
c=[{a:0 for a in"abcdefghijklmnopqrstuvwxyz'-"}for q in range(5)]
for w in[w.lower().strip()for w in open("l") if len(w)==6]:
 for i in range(5):c[i][w[i]]+=1
[''.join(sorted(q,key=q.get,reverse=True)[:10])for q in c]

Shortened:

c=[{a:0 for a in"abcdefghijklmnopqrstuvwxyz'-"}for q in range(5)]
for w in[w.lower() for w in open("l")if len(w)==6]:
 for i in range(5):c[i][w[i]]+=1
[''.join(sorted(q,key=q.get,reverse=True)[:10])for q in c]

My results were: ['sbcapfdtmg', 'aoeirulhnt', 'aironeluts', 'etnlriaosc', 'seyrdtnlah'].

*My math on the 4176 may be a little short due to words with hyphens or apostrophes being omitted

Brigand

Posted 2013-01-21T06:53:56.353

Reputation: 1 137

1While this solution is a good heuristic and will likely return a good solution, I do not believe it is guaranteed to return the optimal solution. The reason is that you are not capturing the constraints between the reels: You are treating each reel as an independent variable when in fact they are dependent. For example, it might be the case that the words that share the most common first letter have a large variance in the distribution of their second letter. If such is the case, then your solution might produce combinations of reels that in fact do not allow any words at all. – ESultanik – 2013-01-21T14:56:53.693

1

Q

? (todo) words

Words should be stored in a file called words

(!:')10#/:(desc')(#:'')(=:')(+:)w@(&:)(5=(#:')w)&(&/')(w:(_:)(0:)`:words)in\:.Q.a

Runs in about 170 ms on my i7. It analyses the wordlist, looking for the most common letter in each position (obviously filtering out any non-candidates). It's a lazy naive solution but produces a reasonably good result with minimal code.

Results:

"sbcapfdtmg"
"aoeirulhnt"
"aironeluts"
"etnlriaosc"
"seyrdtnlah"

skeevey

Posted 2013-01-21T06:53:56.353

Reputation: 4 139

How many 5 letter words did you find? – DavidC – 2013-01-21T18:09:46.950

I did the same thing in python and got 16353. – cardboard_box – 2013-01-21T18:27:02.513

Is this the same greedy algorithm as FakeRainBrigand's? – Peter Taylor – 2013-01-21T18:32:54.133

It appears to be the same result as FakeRainBrigand's, so my count is probably wrong. – cardboard_box – 2013-01-21T18:38:03.900

1@cardboard_box, your result is definitely wrong. There aren't that many 5-letter words in the dictionary. – Peter Taylor – 2013-01-21T18:39:08.153

1Yep, it's 1115. I counted the number of correct letters in any word instead of the number of correct words. I think I need another coffee. – cardboard_box – 2013-01-21T18:45:42.810

0

Edit: Now that the rules have been modified, this approach is disqualified. I'm going to leave it here in case anyone is interested until I eventually getting around to modifying it for the new rules.

Python: 277 Characters

I'm pretty sure that the generalized version of this problem is NP-Hard, and the question didn't require finding the fastest solution, so here's a brute-force method of doing it:

import itertools,string
w=[w.lower()[:-1] for w in open('w') if len(w)==6]
v=-1
for l in itertools.product(itertools.combinations(string.ascii_lowercase,10),repeat=5):
 c=sum(map(lambda d:sum(map(lambda i:i[0] in i[1],zip(d,l)))==5,w))
 if c>v:
  v=c
  print str(c)+" "+str(l)

Note that I renamed the word list file to just "w" to save a few characters.

The output is the number of words that are possible from a given configuration followed by the configuration itself:

34 (('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'))
38 (('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'k'))
42 (('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'l'))
45 (('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'n'))
50 (('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'r'))
57 (('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 's'))
60 (('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'k', 's'))
64 (('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'l', 's'))
67 (('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'n', 's'))
72 (('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'), ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'r', 's'))
...

The last line of output before the program terminates is guaranteed to be the optimal solution.

ESultanik

Posted 2013-01-21T06:53:56.353

Reputation: 1 078

I'd love to see a C or ASM version of your code so that it could actually finish this year :-) Or at least run it until it gets to 1116. Could you write it without itertools, so I can run it on jython? (faster than regular python, but easier than cython.) – Brigand – 2013-01-21T15:13:58.410

Nevermind about the jython thing. I needed to grab the alpha. It still crashed (too much memory) but that appears unavoidable. – Brigand – 2013-01-21T15:36:48.487

I'm pretty sure that even if this were implemented in assembly it would take longer than my lifetime to complete on current hardware :-P – ESultanik – 2013-01-21T16:53:39.753

The issue is that I am iterating over (26 choose 10)^5 ≈ 4.23*10^33 possibilities. Even if we could test one possibility per nanosecond, it would take about 10^7 times the current age of the universe to finish. – ESultanik – 2013-01-21T16:59:58.997

1There are two characters which don't appear in the 5th position in any word in the given word list, so you can reduce the number of possibilities by a factor of about 4. That's how I got "about 110.3 bits" in my comment on the question. – Peter Taylor – 2013-01-21T17:29:33.880