Guess the word (aka Lingo)

13

6

The goal of this challenge is to write a program able to guess a word in the smallest possible number of attempts. It is based on the concept of the Lingo TV show (http://en.wikipedia.org/wiki/Lingo_(U.S._game_show)).

Rules

Given a word length passed as the first argument on its command line, the player program disposes of five attempts to guess the word by writing a guess on its standard output followed by a single \n character.

After a guess is made, the program receives a string on its standard input, also followed by a single \n character.

The string has the same length as the word to guess and is comprised of a sequence of the following characters:

  • X: which means that the given letter is not present in the word to guess
  • ?: which means that the given letter is present in the word to guess, but at another location
  • O: which means that the letter at this location has been correctly guessed

For example, if the word to guess is dents, and the program sends the word dozes, it will receive OXX?O because d and s are correct, e is misplaced, and o and z are not present.

Be careful that if a letter is present more times in the guessing attempt than in the word to guess, it will not be marked as ? and O more times than the number of occurences of the letter in the word to guess. For example, if the word to guess is cozies and the program sends tosses, it will receive XOXXOO because there is only one s to locate.

Words are chosen from an english word list. If the word sent by the program is not a valid word of the correct length, the attempt is considered as an automatic failure and only X's are returned.
The player program should assume that a file named wordlist.txt and containing one word per line is present in the current working directory, and can be read as necessary.
Guesses should only be comprised of alphabetic low-case characters ([a-z]).
No other network or file operations are allowed for the program.

The game ends when a string only comprised of O is returned or after the program has made 5 attempts and was not able to guess the word.

Scoring

The score of a game is given by the given formula:

score = 100 * (6 - number_of_attempts)

So if the word is correctly guessed on the first try, 500 points are given. The last try is worth 100 points.

Failure to guess the word grants zero points.

The pit

Player programs will be evaluated by trying to have them guess 100 random words for each word length between 4 and 13 characters inclusively.
Random word selection will be done by advance so all entries will have to guess the same words.

The winning program, and accepted answer, will be the one to reach the highest score.

Programs will be run in an Ubuntu virtual machine, using the code at https://github.com/noirotm/lingo. Implementations in any language are accepted as long as reasonable instructions to compile and/or run them are provided.

I'm providing a few test implementations in ruby in the git repository, feel free to take inspiration from them.

This question will be periodically updated with rankings for published answers so challengers can improve their entries.

The official final evaluation will take place on the 1st of July.

Update

Entries can now assume the presence of wordlistN.txt files to speed up reading the word list for the current word length for N between 4 and 13 inclusively.

For example, there is a wordlist4.txt file containing all four letter words, and wordlist10.txt containing all ten letter words, and so on.

First round results

At the date of 2014-07-01, three entries have been submitted, with the following results:

                        4       5       6       7       8       9       10      11      12      13      Total
./chinese-perl-goth.pl  8100    12400   15700   19100   22100   25800   27900   30600   31300   33600   226600
java Lingo              10600   14600   19500   22200   25500   28100   29000   31600   32700   33500   247300
./edc65                 10900   15800   22300   24300   27200   29600   31300   33900   33400   33900   262600

** Rankings **
1: ./edc65 (262600)
2: java Lingo (247300)
3: ./chinese-perl-goth.pl (226600)

All entries performed consistently, with a clear winner, being @edc65's C++'s entry.

All contestants are pretty awesome. I have been unable so far to even beat @chinese-perl-goth.
If more entries are submitted, another evaluation will take place. Current entries can also be improved if you feel like you can do better.

SirDarius

Posted 2014-05-09T11:29:35.057

Reputation: 378

So basicly you have to make 2 programs/functions? 1 that knows the word and the other one that tries to guess it? – Teun Pronk – 2014-05-09T12:14:55.887

@TeunPronk no, only one program, the word guesser. The game engine exists and can be compiled from the github repository mentioned in the question. – SirDarius – 2014-05-09T12:24:11.870

1Just to clarify: if the program takes more than 6 tries to guess the word, does it get negative points or just zero? In other words, do we need logic to quit the program after 6 tries to avoid negative points? (Rules say zero points if the program fails to guess the word) – DankMemes – 2014-05-09T13:44:18.623

@ZoveGames: OP states *the player program disposes of five attempts to guess the word by writing a guess on its standard output...*. – Kyle Kanos – 2014-05-09T13:57:30.060

Sorry I didn't read completely... – DankMemes – 2014-05-09T14:07:55.720

1@ZoveGames after five attempts, your program should exit, but the game engine will forcibly terminate it if it refuses to do so :) – SirDarius – 2014-05-09T14:38:36.893

@SirDarius Is the program allowed to make you guess a new word? – MisterBla – 2014-05-15T12:12:41.610

@RichardA no, the player program is run only for one word. – SirDarius – 2014-05-15T12:18:28.107

@SirDarius Could you please explain what you mean with "Implementations in any language are accepted as long as reasonable instructions to compile and/or run them are provided."? I didn't quite understand what you meant with that. – MisterBla – 2014-05-15T12:22:21.617

@RichardA There is no language restriction for this challenge, however, if you choose an exotic language not readily available on Ubuntu, please give explanations how to have your program running. – SirDarius – 2014-05-15T12:34:57.570

@SirDarius Ubuntu ships with Python right? – MisterBla – 2014-05-15T12:37:17.793

1@RichardA yes right, don't worry about Python, it is a first-class citizen, so I will have no problem running some python code :) – SirDarius – 2014-05-15T12:39:13.393

Seems like Python is unable to pick up from stdin. Not sure why. pit.rb hangs when trying to read from stdin – MisterBla – 2014-05-17T21:30:34.857

The output of the program should be a valid word? Or can it be any sequence of character? I want to clarify the rule "If the word sent by the program is not a valid word of the correct length" whether it only wants to specify the length of the guess. Because I see this really similar to Mastermind, where you can guess any sequence of colored pegs.

– justhalf – 2014-05-19T01:40:01.617

@justhalf you must give a word that belongs to the wordlist, AND has the same length as the word to guess. – SirDarius – 2014-05-19T07:37:16.447

@RichardA what do you mean by pit.rb hangs? It takes quite a long time to finish pit.rb, for your information. And ensure that you do sys.stdout.flush() after each print in Python. – justhalf – 2014-05-20T08:36:50.600

@justhalf Well, after 10 minutes NOTHING happens. I do flush my output, but it seems either the lingo application waits for input indefinitely or my python script waits for output indefinitely. Send: print(words[index], end="", flush=True, file=sys.stdout) Receive: response = sys.stdin.readline() – MisterBla – 2014-05-20T09:06:53.890

@RichardA if you don't feel like publishing a non-working entry, you can always use a temporary Gist as a sandbox, so I can test things on my side. – SirDarius – 2014-05-20T09:34:51.043

@SirDarius http://pastebin.com/Rfx1D8Xc My entry so far.

– MisterBla – 2014-05-20T09:45:38.747

@RichardA using python 2 here, replaced the print line by print words[index] followed by sys.stdout.flush() and it works fine – SirDarius – 2014-05-20T09:57:01.067

@SirDarius I'm using Python 3, is that okay or do I HAVE to use Python 2? – MisterBla – 2014-05-20T10:20:51.693

You should not use end="", the game engine specifically waits for \n – justhalf – 2014-05-20T10:33:21.033

@RichardA Either will do, as long as it works ;) – SirDarius – 2014-05-20T10:34:56.310

Anyway, for testing you should use ./lingo instead of pit.rb, right? You can run just single word there. – justhalf – 2014-05-20T10:35:32.520

@justhalf Does it really? I might have found my bug then! – MisterBla – 2014-05-20T10:36:06.643

1@justhalf Thanks so much for that! I can finally continue! – MisterBla – 2014-05-20T10:37:33.340

What are the runtime constraints? My first guess is pretty slow (52 hours in my first try, but I can do better...) – edc65 – 2014-06-22T16:04:39.020

@edc65 I haven't enforced constraints yet, given that all posted entries are quite fast so far. 52 hours is not acceptable though ;) The challenge is not that hard, unless you are using an extremely complex algorithm, I don't see why a guess would take more than a second (not including the time needed to read the word list). Don't let this discourage you though, I'm eager to see more challengers ! – SirDarius – 2014-06-23T08:08:33.643

@SirDarius, is there any possibility to count the number of correct guesses beside the score? I feel my answer tends to give the correct answers albeit in more guesses. – justhalf – 2014-07-02T00:15:40.867

1@justhalf good idea indeed, i'll try to implement that – SirDarius – 2014-07-02T09:51:15.337

Answers

5

C++ 267700 Points

A porting from an old MasterMind engine.
Differences from MasterMind:

  • More slots
  • More symbols
  • Bigger solution space (but non so much, because not all symbols combination allowed)
  • The response is much informative, so we have more information after each guess
  • The response is slower to generate and that's a pity because my algorithm have to do it a lot.

The basic idea is choosing the word that minimize the solution space. The algorithm is really slow for the first guess (I mean 'days'), but the best first guess depends only on the word len, so it's hardcoded in the source. The other guesses are done in matter of seconds.

The Code

(Compile with g++ -O3)

#include <iostream>
#include <iomanip>
#include <fstream>
#include <string>
#include <ctime>
#include <cstdlib>

using namespace std;

class LRTimer
{
private:
    time_t start;
public:
    void startTimer(void)
    {
        time(&start);
    }

    double stopTimer(void)
    {
        return difftime(time(NULL),start);
    } 

};

#define MAX_WORD_LEN 15
#define BIT_QM 0x8000

LRTimer timer;
int size, valid, wordLen;

string firstGuess[] = { "", "a", "as", "iao", "ares", 
    "raise", "sailer", "saltier", "costlier", "clarities", 
    "anthelices", "petulancies", "incarcerates", "allergenicity" };

class Pattern
{
public:
    char letters[MAX_WORD_LEN];
    char flag;
    int mask;

    Pattern() 
        : letters(), mask(), flag()
    {
    }

    Pattern(string word) 
        : letters(), mask(), flag()
    {
        init(word);
    }

    void init(string word)
    {
        const char *wdata = word.data();
        for(int i = 0; i < wordLen; i++) {
            letters[i] = wdata[i];
            mask |= 1 << (wdata[i]-'a');
        }
    }

    string dump()
    {
        return string(letters);
    }

    int check(Pattern &secret)
    {
        if ((mask & secret.mask) == 0)
            return 0;

        char g[MAX_WORD_LEN], s[MAX_WORD_LEN];
        int r = 0, q = 0, i, j, k=99;
        for (i = 0; i < wordLen; i++)
        {
            g[i] = (letters[i] ^ secret.letters[i]);
            if (g[i])
            {
                r += r;
                k = 0;
                g[i] ^= s[i] = secret.letters[i];
            }
            else
            {
                r += r + 1;
                s[i] = 0;
            }
        }
        for (; k < wordLen; k++)
        {
            q += q;
            if (g[k]) 
            {
                for (j = 0; j < wordLen; j++)
                    if (g[k] == s[j])
                    {
                        q |= BIT_QM;
                        s[j] = 0;
                        break;
                    }
            }
        }
        return r|q;
    }

    int count(int ck, int limit);

    int propcheck(int limit);

    void filter(int ck);
};

string dumpScore(int ck)
{
    string result(wordLen, 'X');
    for (int i = wordLen; i--;)
    {
        result[i] = ck & 1 ? 'O' : ck & BIT_QM ? '?' : 'X';
        ck >>= 1;
    }
    return result;
}

int parseScore(string ck)
{
    int result = 0;
    for (int i = 0; i < wordLen; i++)
    {
        result += result + (
            ck[i] == 'O' ? 1 : ck[i] == '?' ? BIT_QM: 0
        );
    }
    return result;
}

Pattern space[100000];

void Pattern::filter(int ck)
{
    int limit = valid, i = limit;
//  cerr << "Filter IN Valid " << setbase(10) << valid << " This " << dump() << "\n"; 

    while (i--)
    {
        int cck = check(space[i]);
//      cerr << setbase(10) << setw(8) << i << ' ' << space[i].dump() 
//          << setbase(16) << setw(8) << cck << " (" << Pattern::dumpScore(cck) << ") ";

        if ( ck != cck )
        {
//          cerr << " FAIL\r" ;
            --limit;
            if (i != limit) 
            {
                Pattern t = space[i];
                space[i] = space[limit];
                space[limit] = t;
            }
        }
        else
        {
//          cerr << " PASS\n" ;
        }
    }
    valid = limit;
//  cerr << "\nFilter EX Valid " << setbase(10) << valid << "\n"; 
};

int Pattern::count(int ck, int limit)
{
    int i, num=0;
    for (i = 0; i < valid; ++i)
    {
        if (ck == check(space[i]))
            if (++num >= limit) return num;
    }
    return num;
}

int Pattern::propcheck(int limit)
{
    int k, mv, nv;

    for (k = mv = 0; k < valid; ++k)
    {
        int ck = check(space[k]);
        nv = count(ck, limit);
        if (nv >= limit)
        {
            return 99999;
        }
        if (nv > mv) mv = nv;
    }
    return mv;
}

int proposal(bool last)
{
    int i, minnv = 999999, mv, result;

    for (i = 0; i < valid; i++) 
    {
        Pattern& guess = space[i];
//      cerr << '\r' << setw(6) << i << ' ' << guess.dump();
        if ((mv = guess.propcheck(minnv)) < minnv)
        {
//          cerr << setw(6) << mv << ' ' << setw(7) << setiosflags(ios::fixed) << setprecision(0) << timer.stopTimer() << " s\n";
            minnv = mv;
            result = i;
        }
    }   
    if (last) 
        return result;
    minnv *= 0.75;
    for (; i<size; i++) 
    {
        Pattern& guess = space[i];
//      cerr << '\r' << setw(6) << i << ' ' << guess.dump();
        if ((mv = guess.propcheck(minnv)) < minnv)
        {
//          cerr << setw(6) << mv << ' ' << setw(7) << setiosflags(ios::fixed) << setprecision(0) << timer.stopTimer() << " s\n";
            minnv = mv;
            result = i;
        }
    }   
    return result;
}

void setup(string wordfile)
{
    int i = 0; 
    string word;
    ifstream infile(wordfile.data());
    while(infile >> word)
    {
        if (word.length() == wordLen) {
            space[i++].init(word);
        }
    }
    infile.close(); 
    size = valid = i;
}

int main(int argc, char* argv[])
{
    if (argc < 2) 
    {
        cerr << "Specify word length";
        return 1;
    }

    wordLen = atoi(argv[1]);

    timer.startTimer();
    setup("wordlist.txt");
    //cerr << "Words " << size 
    //  << setiosflags(ios::fixed) << setprecision(2)
    //  << " " << timer.stopTimer() << " s\n";

    valid = size;
    Pattern Guess = firstGuess[wordLen];
    for (int t = 0; t < 5; t++)
    {
        cout << Guess.dump() << '\n' << flush;
        string score;
        cin >> score;
        int ck = parseScore(score);
        //cerr << "\nV" << setw(8) << valid << " #" 
        //  << setw(3) << t << " : " << Guess.dump()
        //  << " : " << score << "\n";
        if (ck == ~(-1 << wordLen))
        {
            break;
        }
        Guess.filter(ck); 
        Guess = space[proposal(t == 3)];
    }
    // cerr << "\n";

    double time = timer.stopTimer();
    //cerr << setiosflags(ios::fixed) << setprecision(2)
    //   << timer.stopTimer() << " s\n";

    return 0;
}

My scores

Evaluation with lingo, 100 rounds:

4   9000
5   17700
6   22000
7   25900
8   28600
9   29700
10  31000
11  32800
12  33500
13  34900

Total 265'100

Self evalueted scores

Here are my average points, scored on the whole wordlist. Not completely relyable because some detail of algorithm has changed during the tests.

 4 # words  6728 PT AVG   100.98 87170.41 s
 5 # words 14847 PT AVG   164.44 42295.38 s
 6 # words 28127 PT AVG   212.27 46550.00 s 
 7 # words 39694 PT AVG   246.16 61505.54 s
 8 # words 49004 PT AVG   273.23 63567.45 s
 9 # words 50655 PT AVG   289.00 45438.70 s
10 # words 43420 PT AVG   302.13 2952.23 s
11 # words 35612 PT AVG   323.62 3835.00 s
12 # words 27669 PT AVG   330.19 5882.98 s
13 # words 19971 PT AVG   339.60 2712.98 s

According to these numbers, my average score should be near 257'800

PIT SCORE

At last I installed Ruby, so now I have an 'official' score:

    4       5       6       7       8       9      10      11      12      13   TOTAL
10700   16300   22000   25700   27400   30300   32000   33800   34700   34800   267700

edc65

Posted 2014-05-09T11:29:35.057

Reputation: 31 086

My intention was to create something like this. Alas I couldn't find how to truly minimize the solution space, so I approximated it. And mine is in Python, so it's even slower, haha. I also hardcoded the first guess. Yours is definitely better than mine for the shorter strings. Can you test with my implementation also on the same set of input to compare? Also we have quite a different set of first guesses. – justhalf – 2014-06-30T10:37:26.193

@justhalf I tried some rounds with lingo.go. I didn't check with the pit (I don't have Ruby installed). Our scores are close, it's a matter of luck I think. – edc65 – 2014-06-30T12:47:17.680

Yours is better I think, since your reported average is better than the score that I reported. Although you seem to take substantially more time. – justhalf – 2014-07-01T04:36:27.427

This seems to be the strongest player so far. I am going to run the official result later today, stay tuned ! – SirDarius – 2014-07-01T09:24:54.597

Oops, correction for my comment above, I forgot that my submission is in Java. – justhalf – 2014-07-01T14:55:17.440

5

Java, 249700 points (beats Chinese Perl Goth in my test)

Updated ranklist:

                        4       5       6       7       8       9       10      11      12      13      Total
perl chinese_perl_goth.pl 6700  12300   16900   19200   23000   26100   28500   29600   32100   33900   228300
java Lingo              9400    14700   18900   21000   26300   28700   30300   32400   33800   34200   249700

Here is the old ranklist using pit.rb:

                        4       5       6       7       8       9       10      11      12      13      Total
ruby player-example.rb  200     400     400     500     1800    1400    1700    1600    3200    4400    15600
ruby player-example2.rb 2700    3200    2500    4300    7300    6300    8200    10400   13300   15000   73200
ruby player-example3.rb 4500    7400    9900    13700   15400   19000   19600   22300   24600   27300   163700
perl chinese_perl_goth.pl 6400  14600   16500   21000   22500   26000   27200   30600   32500   33800   231100
java Lingo              4800    13100   16500   21400   27200   29200   30600   32400   33700   36100   245000

** Rankings **
1: java Lingo (245000)
2: perl chinese_perl_goth.pl (231100)
3: ruby player-example3.rb (163700)
4: ruby player-example2.rb (73200)
5: ruby player-example.rb (15600)

Compared to @chineseperlgoth, I lose in shorter words (< 6 chars) but I win in longer words (>= 6 chars).

The idea is similar to @chineseperlgoth, it's just that my main idea is about finding the guess (can be any word of the same length, not necessarily one of the remaining possibilities) which gives the most information for the next guess.

Currently I'm still playing with the formula, but for the scoreboard above, I choose the word which will yield the minimum for:

-num_confusion * entropy

The latest version uses different scoring to find next best guess, which is maximizing the number of "single possibility" after the current guess. This is done by trying all words in pruned wordlist (to save time) on all possible candidates, and see which guess is more probable to produce "single possibility" (that is, after this guess there will only be one possible answer) for the next guess.

So for example this run:

Starting new round, word is boons
Got:   seora
Sent:  ?XOXX
Got:   topsl
Sent:  XOX?X
Got:   monks
Sent:  XO?XO
Got:   bewig
Sent:  OXXXX
Got:   boons
Sent:  OOOOO
Round won with score 100

From the first three guesses, we already got "*oo*s" with an "n" somewhere and we still need to figure out one more letter. Now the beauty of this algorithm is that instead of guessing words which are similar to that form, it instead guesses the word which has no relation to previous guesses at all, trying to give more letters, hopefully revealing the missing letter. In this case it happens to also get the position for the missing "b" correctly, and concludes with the correct final guess "boons".

Here is the code:

import java.util.*;
import java.io.*;

class Lingo{
    public static String[] guessBestList = new String[]{
                                "",
                                "a",
                                "sa",
                                "tea",
                                "orae",
                                "seora", // 5
                                "ariose",
                                "erasion",
                                "serotina",
                                "tensorial",
                                "psalterion", // 10
                                "ulcerations",
                                "culteranismo",
                                "persecutional"};
    public static HashMap<Integer, ArrayList<String>> wordlist = new HashMap<Integer, ArrayList<String>>();

    public static void main(String[] args){
        readWordlist("wordlist.txt");
        Scanner scanner = new Scanner(System.in);
        int wordlen = Integer.parseInt(args[0]);
        int roundNum = 5;
        ArrayList<String> candidates = new ArrayList<String>();
        candidates.addAll(wordlist.get(wordlen));
        String guess = "";
        while(roundNum-- > 0){
            guess = guessBest(candidates, roundNum==4, roundNum==0);
            System.out.println(guess);
            String response = scanner.nextLine();
            if(isAllO(response)){
                break;
            }
            updateCandidates(candidates, guess, response);
            //print(candidates);
        }
    }

    public static void print(ArrayList<String> candidates){
        for(String str: candidates){
            System.err.println(str);
        }
        System.err.println();
    }

    public static void readWordlist(String path){
        try{
            BufferedReader reader = new BufferedReader(new FileReader(path));
            while(reader.ready()){
                String word = reader.readLine();
                if(!wordlist.containsKey(word.length())){
                    wordlist.put(word.length(), new ArrayList<String>());
                }
                wordlist.get(word.length()).add(word);
            }
        } catch (Exception e){
            System.exit(1);
        }
    }

    public static boolean isAllO(String response){
        for(int i=0; i<response.length(); i++){
            if(response.charAt(i) != 'O') return false;
        }
        return true;
    }

    public static String getResponse(String word, String guess){
        char[] wordChar = word.toCharArray();
        char[] result = new char[word.length()];
        Arrays.fill(result, 'X');
        for(int i=0; i<guess.length(); i++){
            if(guess.charAt(i) == wordChar[i]){
                result[i] = 'O';
                wordChar[i] = '_';
            }
        }
        for(int i=0; i<guess.length(); i++){
            if(result[i] == 'O') continue;
            for(int j=0; j<wordChar.length; j++){
                if(result[j] == 'O') continue;
                if(wordChar[j] == guess.charAt(i)){
                    result[i] = '?';
                    wordChar[j] = '_';
                    break;
                }
            }
        }
        return String.valueOf(result);
    }

    public static void updateCandidates(ArrayList<String> candidates, String guess, String response){
        for(int i=candidates.size()-1; i>=0; i--){
            String candidate = candidates.get(i);
            if(!response.equals(getResponse(candidate, guess))){
                candidates.remove(i);
            }
        }
    }

    public static int countMatchingCandidates(ArrayList<String> candidates, String guess, String response){
        int result = 0;
        for(String candidate: candidates){
            if(response.equals(getResponse(candidate, guess))){
                result++;
            }
        }
        return result;
    }

    public static String[] getSample(ArrayList<String> words, int size){
        String[] result = new String[size];
        int[] indices = new int[words.size()];
        for(int i=0; i<words.size(); i++){
            indices[i] = i;
        }
        Random rand = new Random(System.currentTimeMillis());
        for(int i=0; i<size; i++){
            int take = rand.nextInt(indices.length-i);
            result[i] = words.get(indices[take]);
            indices[take] = indices[indices.length-i-1];
        }
        return result;
    }

    public static String guessBest(ArrayList<String> candidates, boolean firstGuess, boolean lastGuess){
        if(candidates.size() == 1){
            return candidates.get(0);
        }
        String minGuess = candidates.get(0);
        int wordlen = minGuess.length();
        if(firstGuess && guessBestList[wordlen].length()==wordlen){
            return guessBestList[wordlen];
        }
        int minMatches = Integer.MAX_VALUE;
        String[] words;
        if(lastGuess){
            words = candidates.toArray(new String[0]);
        } else if (candidates.size()>10){
            words = bestWords(wordlist.get(wordlen), candidates, 25);
        } else {
            words = wordlist.get(wordlen).toArray(new String[0]);
        }
        for(String guess: words){
            double sumMatches = 0;
            for(String word: candidates){
                int matches = countMatchingCandidates(candidates, guess, getResponse(word, guess));
                if(matches == 0) matches = candidates.size();
                sumMatches += (matches-1)*(matches-1);
            }
            if(sumMatches < minMatches){
                minGuess = guess;
                minMatches = sumMatches;
            }
        }
        return minGuess;
    }

    public static String[] bestWords(ArrayList<String> words, ArrayList<String> candidates, int size){
        int[] charCount = new int[123];
        for(String candidate: candidates){
            for(int i=0; i<candidate.length(); i++){
                charCount[(int)candidate.charAt(i)]++;
            }
        }
        String[] tmp = (String[])words.toArray(new String[0]);
        Arrays.sort(tmp, new WordComparator(charCount));
        String[] result = new String[size+Math.min(size, candidates.size())];
        String[] sampled = getSample(candidates, Math.min(size, candidates.size()));
        for(int i=0; i<size; i++){
            result[i] = tmp[tmp.length-i-1];
            if(i < sampled.length){
                result[size+i] = sampled[i];
            }
        }
        return result;
    }

    static class WordComparator implements Comparator<String>{
        int[] charCount = null;

        public WordComparator(int[] charCount){
            this.charCount = charCount;
        }

        public Integer count(String word){
            int result = 0;
            int[] multiplier = new int[charCount.length];
            Arrays.fill(multiplier, 1);
            for(char chr: word.toCharArray()){
                result += multiplier[(int)chr]*this.charCount[(int)chr];
                multiplier[(int)chr] = 0;
            }
            return Integer.valueOf(result);
        }

        public int compare(String s1, String s2){
            return count(s1).compareTo(count(s2));
        }
    }
}

justhalf

Posted 2014-05-09T11:29:35.057

Reputation: 2 070

Awesome, this entry is seriously strong ! I remember seeing human players in the TV show using a similar strategy when they were unable to guess a word from the current clues. – SirDarius – 2014-05-19T19:13:57.760

3

Perl

There's still some room for improvement but at least it beats the included example players :)

Assumes write access to current directory for caching wordlists (to make it run a bit faster); will create wordlist.lenN.stor files using Storable. If this is an issue, remove read_cached_wordlist and change the code to use just read_wordlist.

Explanation

First, I build a histogram of letter frequencies in all the words in the current wordlist (build_histogram). Then I need to choose my next guess - which is done by find_best_word. The scoring algorithm is just adding the histogram values together, skipping letters already seen. This gives me a word containing the most frequent letters in the wordlist. If there is more than one word with a given score, I choose one randomly. Having found a word, I send it to the game engine, read the reply then try and do something useful with it :)

I maintain a set of conditions, that is, letters that can occur at a given position in a word. On start, this is just simple (['a'..'z'] x $len), but it is updated basing on the hints given in reply (see update_conds). I build a regex out of those conditions then and filter the wordlist through it.

During tests I have found out then that the aforementioned filtering doesn't handle ?s too well, hence the second filter (filter_wordlist_by_reply). This takes advantage of the fact that a letter marked as ? occurs in the word on a different position, and filters the wordlist accordingly.

These steps are repeated for every iteration of the main loop, unless the solution is found (or it is not possible to read from stdin anymore, which means a failure).

Code

#!perl
use strict;
use warnings;
use v5.10;
use Storable;

$|=1;

sub read_wordlist ($) {
    my ($len) = @_;
    open my $w, '<', 'wordlist.txt' or die $!;
    my @wordlist = grep { chomp; length $_ == $len } <$w>;
    close $w;
    \@wordlist
}

sub read_cached_wordlist ($) {
    my ($len) = @_;
    my $stor = "./wordlist.len$len.stor";
    if (-e $stor) {
        retrieve $stor
    } else {
        my $wl = read_wordlist $len;
        store $wl, $stor;
        $wl
    }
}

sub build_histogram ($) {
    my ($wl) = @_;
    my %histo = ();
    for my $word (@$wl) {
        $histo{$_}++ for ($word =~ /./g);
    }
    \%histo
}

sub score_word ($$) {
    my ($word, $histo) = @_;
    my $score = 0;
    my %seen = ();
    for my $l ($word =~ /./g) {
        if (not exists $seen{$l}) {
            $score += $histo->{$l};
            $seen{$l} = 1;
        }
    }
    $score
}

sub find_best_word ($$) {
    my ($wl, $histo) = @_;
    my @found = (sort { $b->[0] <=> $a->[0] } 
                 map [ score_word($_, $histo), $_ ], @$wl);
    return undef unless @found;
    my $maxscore = $found[0]->[0];
    my @max;
    for (@found) {
        last if $_->[0] < $maxscore;
        push @max, $_->[1];
    }
    $max[rand @max]
}

sub build_conds ($) {
    my ($len) = @_;
    my @c;
    push @c, ['a'..'z'] for 1..$len;
    \@c
}

sub get_regex ($) {
    my ($cond) = @_;
    local $" = '';
    my $r = join "", map { "[@$_]" } @$cond;
    qr/^$r$/
}

sub remove_cond ($$$) {
    my ($conds, $pos, $ch) = @_;
    return if (scalar @{$conds->[$pos]} == 1);
    return unless grep { $_ eq $ch } @{$conds->[$pos]};
    $conds->[$pos] = [ grep { $_ ne $ch } @{$conds->[$pos]} ]
}

sub add_cond ($$$) {
    my ($conds, $pos, $ch) = @_;
    return if (scalar @{$conds->[$pos]} == 1);
    return if grep { $_ eq $ch } @{$conds->[$pos]};
    push @{$conds->[$pos]}, $ch
}

sub update_conds ($$$$) {
    my ($word, $reply, $conds, $len) = @_;
    my %Xes;
    %Xes = ();
    for my $pos (reverse 0..$len-1) {
        my $r = substr $reply, $pos, 1;
        my $ch = substr $word, $pos, 1;

        if ($r eq 'O') {
            $conds->[$pos] = [$ch]
        }

        elsif ($r eq '?') {
            for my $a (0..$len-1) {
                if ($a == $pos) {
                    remove_cond $conds, $a, $ch
                } else {
                    unless (exists $Xes{$a} and $Xes{$a} eq $ch) {
                        add_cond($conds, $a, $ch);
                    }
                }
            }
        }

        elsif ($r eq 'X') {
            $Xes{$pos} = $ch;
            for my $a (0..$len-1) {
                remove_cond $conds, $a, $ch
            }
        }
    }
}

sub uniq ($) {
    my ($data) = @_;
    my %seen; 
    [ grep { !$seen{$_}++ } @$data ]
}

sub filter_wordlist_by_reply ($$$) {
    my ($wl, $word, $reply) = @_;
    return $wl unless $reply =~ /\?/;
    my $newwl = [];
    my $len = length $reply;
    for my $pos (0..$len-1) {
        my $r = substr $reply, $pos, 1;
        my $ch = substr $word, $pos, 1;
        next unless $r eq '?';
        for my $a (0..$len-1) {
            if ($a != $pos) {
                if ('O' ne substr $reply, $a, 1) {
                    push @$newwl, grep { $ch eq substr $_, $a, 1 } @$wl
                }
            }
        }
    }
    uniq $newwl
}

my $len = $ARGV[0] or die "no length";
my $wl = read_cached_wordlist $len;
my $conds = build_conds $len;

my $c=0;
do {
    my $histo = build_histogram $wl;
    my $word = find_best_word $wl, $histo;
    die "no candidates" unless defined $word;
    say $word;
    my $reply = <STDIN>; 
    chomp $reply;
    exit 1 unless length $reply;
    exit 0 if $reply =~ /^O+$/;
    update_conds $word, $reply, $conds, $len;
    $wl = filter_wordlist_by_reply $wl, $word, $reply;
    $wl = [ grep { $_ =~ get_regex $conds } @$wl ]
} while 1

chinese perl goth

Posted 2014-05-09T11:29:35.057

Reputation: 1 089

1My rules originally forbade writing to disk, but I make it an exception to allow caching of the word list, because the big one I found makes the whole thing annoyingly slow to test :) – SirDarius – 2014-05-15T08:27:27.497

This entry works better than my own (unpublished yet) attempts. Could you explain your algorithm a bit ? – SirDarius – 2014-05-15T12:04:28.000

I've added a short explanation; fixed the code formatting a bit as well. – chinese perl goth – 2014-05-15T17:34:26.077

@SirDarius: I don't think there would be any loss if any particular test uses a word list that contains only entries of the proper length. While it shouldn't be overly difficult for a program to ignore words within the file whose length is other than specified, the existence of such words would at minimum slow down testing. Also, I wonder if there would be value in allowing submissions to specify an optional program which, given a word list and N, would send to standard output a word list formatted in whatever fashion is most helpful... – supercat – 2014-06-01T22:08:43.507

...and allow the main program to use that rather than a raw word list (so if some pre-analysis is needed, it will only have to be done once for each word length, rather than once per game). – supercat – 2014-06-01T22:10:16.100

@supercat i'll update the challenge so that files like wordlistNN.txt are available. Also you can do like the perl entry and cache word list data as you want. – SirDarius – 2014-06-02T23:29:40.180