C, averaging 500+ 1500 1750 points
This is a relatively minor improvement over version 2 (see below for notes on previous versions). There are two parts. First: Instead of selecting boards randomly from the pool, the program now iterates over every board in the pool, using each one in turn before returning to the top of the pool and repeating. (Since the pool is being modified while this iteration occurs, there will still be boards that get chosen twice in a row, or worse, but this is not a serious concern.) The second change is that the program now tracks when the pool changes, and if the program goes for too long without improving the pool's contents, it determines that the search has "stalled", empties the pool, and starts over with a new search. It continues to do this until the two minutes are up.
I had initially thought that I would be employing some kind of heuristic search to get beyond the 1500-point range. @mellamokb's comment about a 4527-point board led me to assume that there was plenty of room for improvement. However, we are using a relatively small wordlist. The 4527-point board was scoring using YAWL, which is the most inclusive wordlist out there -- it's even bigger than the official US Scrabble wordlist. With this in mind, I re-examined the boards my program had found and noticed that there appeared to be a limited set of boards above 1700 or so. So for example, I had multiple runs that had discovered a board scoring 1726, but it was always the exact same board that was found (ignoring rotations and reflections).
As another test, I ran my program using YAWL as the dictionary, and it found the 4527-point board after about a dozen runs. Given this, I am hypothesizing that my program is already at the upper limit of the search space, and therefore the rewrite I was planning would introduce extra complexity for very little gain.
Here is my list of the five highest-scoring boards my program has found using the english.0
wordlist:
1735 : D C L P E I A E R N T R S E G S
1738 : B E L S R A D G T I N E S E R S
1747 : D C L P E I A E N T R D G S E R
1766 : M P L S S A I E N T R N D E S G
1772: G R E P T N A L E S I T D R E S
My belief is that the 1772 "grep board" (as I've taken to calling it), with 531 words, is the highest-scoring board possible with this wordlist. Over 50% of my program's two-minute runs end with this board. I've also left my program running overnight without it finding anything better. So if there is a higher-scoring board, it would likely have to have some aspect that defeats the program's search technique. A board in which every possible small change to the layout causes a huge drop in the total score, for example, might never be discovered by my program. My hunch is that such a board is very unlikely to exist.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <time.h>
#define WORDLISTFILE "./english.0"
#define XSIZE 4
#define YSIZE 4
#define BOARDSIZE (XSIZE * YSIZE)
#define DIEFACES 6
#define WORDBUFSIZE 256
#define MAXPOOLSIZE 32
#define STALLPOINT 64
#define RUNTIME 120
/* Generate a random int from 0 to N-1.
*/
#define random(N) ((int)(((double)(N) * rand()) / (RAND_MAX + 1.0)))
static char const dice[BOARDSIZE][DIEFACES] = {
"aaeegn", "elrtty", "aoottw", "abbjoo",
"ehrtvw", "cimotu", "distty", "eiosst",
"delrvy", "achops", "himnqu", "eeinsu",
"eeghnw", "affkps", "hlnnrz", "deilrx"
};
/* The dictionary is represented in memory as a tree. The tree is
* represented by its arcs; the nodes are implicit. All of the arcs
* emanating from a single node are stored as a linked list in
* alphabetical order.
*/
typedef struct {
int letter:8; /* the letter this arc is labelled with */
int arc:24; /* the node this arc points to (i.e. its first arc) */
int next:24; /* the next sibling arc emanating from this node */
int final:1; /* true if this arc is the end of a valid word */
} treearc;
/* Each of the slots that make up the playing board is represented
* by the die it contains.
*/
typedef struct {
unsigned char die; /* which die is in this slot */
unsigned char face; /* which face of the die is showing */
} slot;
/* The following information defines a game.
*/
typedef struct {
slot board[BOARDSIZE]; /* the contents of the board */
int score; /* how many points the board is worth */
} game;
/* The wordlist is stored as a binary search tree.
*/
typedef struct {
int item: 24; /* the identifier of a word in the list */
int left: 16; /* the branch with smaller identifiers */
int right: 16; /* the branch with larger identifiers */
} listnode;
/* The dictionary.
*/
static treearc *dictionary;
static int heapalloc;
static int heapsize;
/* Every slot's immediate neighbors.
*/
static int neighbors[BOARDSIZE][9];
/* The wordlist, used while scoring a board.
*/
static listnode *wordlist;
static int listalloc;
static int listsize;
static int xcursor;
/* The game that is currently being examined.
*/
static game G;
/* The highest-scoring game seen so far.
*/
static game bestgame;
/* Variables to time the program and display stats.
*/
static time_t start;
static int boardcount;
static int allscores;
/* The pool contains the N highest-scoring games seen so far.
*/
static game pool[MAXPOOLSIZE];
static int poolsize;
static int cutoffscore;
static int stallcounter;
/* Some buffers shared by recursive functions.
*/
static char wordbuf[WORDBUFSIZE];
static char gridbuf[BOARDSIZE];
/*
* The dictionary is stored as a tree. It is created during
* initialization and remains unmodified afterwards. When moving
* through the tree, the program tracks the arc that points to the
* current node. (The first arc in the heap is a dummy that points to
* the root node, which otherwise would have no arc.)
*/
static void initdictionary(void)
{
heapalloc = 256;
dictionary = malloc(256 * sizeof *dictionary);
heapsize = 1;
dictionary->arc = 0;
dictionary->letter = 0;
dictionary->next = 0;
dictionary->final = 0;
}
static int addarc(int arc, char ch)
{
int prev, a;
prev = arc;
a = dictionary[arc].arc;
for (;;) {
if (dictionary[a].letter == ch)
return a;
if (!dictionary[a].letter || dictionary[a].letter > ch)
break;
prev = a;
a = dictionary[a].next;
}
if (heapsize >= heapalloc) {
heapalloc *= 2;
dictionary = realloc(dictionary, heapalloc * sizeof *dictionary);
}
a = heapsize++;
dictionary[a].letter = ch;
dictionary[a].final = 0;
dictionary[a].arc = 0;
if (prev == arc) {
dictionary[a].next = dictionary[prev].arc;
dictionary[prev].arc = a;
} else {
dictionary[a].next = dictionary[prev].next;
dictionary[prev].next = a;
}
return a;
}
static int validateword(char *word)
{
int i;
for (i = 0 ; word[i] != '\0' && word[i] != '\n' ; ++i)
if (word[i] < 'a' || word[i] > 'z')
return 0;
if (word[i] == '\n')
word[i] = '\0';
if (i < 3)
return 0;
for ( ; *word ; ++word, --i) {
if (*word == 'q') {
if (word[1] != 'u')
return 0;
memmove(word + 1, word + 2, --i);
}
}
return 1;
}
static void createdictionary(char const *filename)
{
FILE *fp;
int arc, i;
initdictionary();
fp = fopen(filename, "r");
while (fgets(wordbuf, sizeof wordbuf, fp)) {
if (!validateword(wordbuf))
continue;
arc = 0;
for (i = 0 ; wordbuf[i] ; ++i)
arc = addarc(arc, wordbuf[i]);
dictionary[arc].final = 1;
}
fclose(fp);
}
/*
* The wordlist is stored as a binary search tree. It is only added
* to, searched, and erased. Instead of storing the actual word, it
* only retains the word's final arc in the dictionary. Thus, the
* dictionary needs to be walked in order to print out the wordlist.
*/
static void initwordlist(void)
{
listalloc = 16;
wordlist = malloc(listalloc * sizeof *wordlist);
listsize = 0;
}
static int iswordinlist(int word)
{
int node, n;
n = 0;
for (;;) {
node = n;
if (wordlist[node].item == word)
return 1;
if (wordlist[node].item > word)
n = wordlist[node].left;
else
n = wordlist[node].right;
if (!n)
return 0;
}
}
static int insertword(int word)
{
int node, n;
if (!listsize) {
wordlist->item = word;
wordlist->left = 0;
wordlist->right = 0;
++listsize;
return 1;
}
n = 0;
for (;;) {
node = n;
if (wordlist[node].item == word)
return 0;
if (wordlist[node].item > word)
n = wordlist[node].left;
else
n = wordlist[node].right;
if (!n)
break;
}
if (listsize >= listalloc) {
listalloc *= 2;
wordlist = realloc(wordlist, listalloc * sizeof *wordlist);
}
n = listsize++;
wordlist[n].item = word;
wordlist[n].left = 0;
wordlist[n].right = 0;
if (wordlist[node].item > word)
wordlist[node].left = n;
else
wordlist[node].right = n;
return 1;
}
static void clearwordlist(void)
{
listsize = 0;
G.score = 0;
}
static void scoreword(char const *word)
{
int const scoring[] = { 0, 0, 0, 1, 1, 2, 3, 5 };
int n, u;
for (n = u = 0 ; word[n] ; ++n)
if (word[n] == 'q')
++u;
n += u;
G.score += n > 7 ? 11 : scoring[n];
}
static void addwordtolist(char const *word, int id)
{
if (insertword(id))
scoreword(word);
}
static void _printwords(int arc, int len)
{
int a;
while (arc) {
a = len + 1;
wordbuf[len] = dictionary[arc].letter;
if (wordbuf[len] == 'q')
wordbuf[a++] = 'u';
if (dictionary[arc].final) {
if (iswordinlist(arc)) {
wordbuf[a] = '\0';
if (xcursor == 4) {
printf("%s\n", wordbuf);
xcursor = 0;
} else {
printf("%-16s", wordbuf);
++xcursor;
}
}
}
_printwords(dictionary[arc].arc, a);
arc = dictionary[arc].next;
}
}
static void printwordlist(void)
{
xcursor = 0;
_printwords(1, 0);
if (xcursor)
putchar('\n');
}
/*
* The board is stored as an array of oriented dice. To score a game,
* the program looks at each slot on the board in turn, and tries to
* find a path along the dictionary tree that matches the letters on
* adjacent dice.
*/
static void initneighbors(void)
{
int i, j, n;
for (i = 0 ; i < BOARDSIZE ; ++i) {
n = 0;
for (j = 0 ; j < BOARDSIZE ; ++j)
if (i != j && abs(i / XSIZE - j / XSIZE) <= 1
&& abs(i % XSIZE - j % XSIZE) <= 1)
neighbors[i][n++] = j;
neighbors[i][n] = -1;
}
}
static void printboard(void)
{
int i;
for (i = 0 ; i < BOARDSIZE ; ++i) {
printf(" %c", toupper(dice[G.board[i].die][G.board[i].face]));
if (i % XSIZE == XSIZE - 1)
putchar('\n');
}
}
static void _findwords(int pos, int arc, int len)
{
int ch, i, p;
for (;;) {
ch = dictionary[arc].letter;
if (ch == gridbuf[pos])
break;
if (ch > gridbuf[pos] || !dictionary[arc].next)
return;
arc = dictionary[arc].next;
}
wordbuf[len++] = ch;
if (dictionary[arc].final) {
wordbuf[len] = '\0';
addwordtolist(wordbuf, arc);
}
gridbuf[pos] = '.';
for (i = 0 ; (p = neighbors[pos][i]) >= 0 ; ++i)
if (gridbuf[p] != '.')
_findwords(p, dictionary[arc].arc, len);
gridbuf[pos] = ch;
}
static void findwordsingrid(void)
{
int i;
clearwordlist();
for (i = 0 ; i < BOARDSIZE ; ++i)
gridbuf[i] = dice[G.board[i].die][G.board[i].face];
for (i = 0 ; i < BOARDSIZE ; ++i)
_findwords(i, 1, 0);
}
static void shuffleboard(void)
{
int die[BOARDSIZE];
int i, n;
for (i = 0 ; i < BOARDSIZE ; ++i)
die[i] = i;
for (i = BOARDSIZE ; i-- ; ) {
n = random(i);
G.board[i].die = die[n];
G.board[i].face = random(DIEFACES);
die[n] = die[i];
}
}
/*
* The pool contains the N highest-scoring games found so far. (This
* would typically be done using a priority queue, but it represents
* far too little of the runtime. Brute force is just as good and
* simpler.) Note that the pool will only ever contain one board with
* a particular score: This is a cheap way to discourage the pool from
* filling up with almost-identical high-scoring boards.
*/
static void addgametopool(void)
{
int i;
if (G.score < cutoffscore)
return;
for (i = 0 ; i < poolsize ; ++i) {
if (G.score == pool[i].score) {
pool[i] = G;
return;
}
if (G.score > pool[i].score)
break;
}
if (poolsize < MAXPOOLSIZE)
++poolsize;
if (i < poolsize) {
memmove(pool + i + 1, pool + i, (poolsize - i - 1) * sizeof *pool);
pool[i] = G;
}
cutoffscore = pool[poolsize - 1].score;
stallcounter = 0;
}
static void selectpoolmember(int n)
{
G = pool[n];
}
static void emptypool(void)
{
poolsize = 0;
cutoffscore = 0;
stallcounter = 0;
}
/*
* The program examines as many boards as it can in the given time,
* and retains the one with the highest score. If the program is out
* of time, then it reports the best-seen game and immediately exits.
*/
static void report(void)
{
findwordsingrid();
printboard();
printwordlist();
printf("score = %d\n", G.score);
fprintf(stderr, "// score: %d points (%d words)\n", G.score, listsize);
fprintf(stderr, "// %d boards examined\n", boardcount);
fprintf(stderr, "// avg score: %.1f\n", (double)allscores / boardcount);
fprintf(stderr, "// runtime: %ld s\n", time(0) - start);
}
static void scoreboard(void)
{
findwordsingrid();
++boardcount;
allscores += G.score;
addgametopool();
if (bestgame.score < G.score) {
bestgame = G;
fprintf(stderr, "// %ld s: board %d scoring %d\n",
time(0) - start, boardcount, G.score);
}
if (time(0) - start >= RUNTIME) {
G = bestgame;
report();
exit(0);
}
}
static void restartpool(void)
{
emptypool();
while (poolsize < MAXPOOLSIZE) {
shuffleboard();
scoreboard();
}
}
/*
* Making small modifications to a board.
*/
static void turndie(void)
{
int i, j;
i = random(BOARDSIZE);
j = random(DIEFACES - 1) + 1;
G.board[i].face = (G.board[i].face + j) % DIEFACES;
}
static void swapdice(void)
{
slot t;
int p, q;
p = random(BOARDSIZE);
q = random(BOARDSIZE - 1);
if (q >= p)
++q;
t = G.board[p];
G.board[p] = G.board[q];
G.board[q] = t;
}
/*
*
*/
int main(void)
{
int i;
start = time(0);
srand((unsigned int)start);
createdictionary(WORDLISTFILE);
initwordlist();
initneighbors();
restartpool();
for (;;) {
for (i = 0 ; i < poolsize ; ++i) {
selectpoolmember(i);
turndie();
scoreboard();
selectpoolmember(i);
swapdice();
scoreboard();
}
++stallcounter;
if (stallcounter >= STALLPOINT) {
fprintf(stderr, "// stalled; restarting search\n");
restartpool();
}
}
return 0;
}
Notes for version 2 (June 9)
Here's one way to use the initial version of my code as a jumping-off point. The changes to this version consist of less than 100 lines, but tripled the average game score.
In this version, the program maintains a "pool" of candidates, consisting of the N highest-scoring boards that the program has generated so far. Every time a new board is generated, it is added to the pool and the lowest-scoring board in the pool is removed (which may very well be the board that was just added, if its score is lower than what's already there). The pool is initially filled with randomly generated boards, after which it keeps a constant size throughout the program's run.
The program's main loop consists of of selecting a random board from the pool and altering it, determining this new board's score and then putting it into the pool (if it scores well enough). In this fashion the program is continually refining high-scoring boards. The main activity is to make stepwise, incremental improvements, but the size of the pool also allows the program to find multi-step improvements that temporarily make a board's score worse before it can make it better.
Typically this program finds a good local maximum rather quickly, after which presumably any better maximum is too distant to be found. And so once again there's little point to running the program longer than 10 seconds. This might be improved by e.g. having the program detect this situation and start a fresh search with a new pool of candidates. However, this would net only a marginal increase. A proper heuristic search technique would likely be a better avenue of exploration.
(Side note: I saw that this version was generating about 5k boards/sec. Since the first version typically did 20k boards/sec, I was initially concerned. Upon profiling, however, I discovered that the extra time was spent managing the wordlist. In other words, it was entirely due to the program finding so many more words per board. In light of this, I considered changing the code to manage the wordlist, but given that this program is only using 10 of its allotted 120 seconds, such an optimization would be very premature.)
Notes for version 1 (June 2)
This is a very, very simple solution. All it does it generate random boards, and then after 10 seconds it outputs the one with the highest score. (I defaulted to 10 seconds because the extra 110 seconds permitted by the problem specification typically doesn't improve the final solution found enough to be worth waiting for.) So it's extremely dumb. However, it has all the infrastructure to make a good starting point for a more intelligent search, and if anyone wishes to make use of it before the deadline, I encourage them to do so.
The program starts by reading the dictionary into a tree structure. (The form isn't quite as optimized as it could be, but it's more than good enough for these purposes.) After some other basic initialization, it then begins generating boards and scoring them. The program examines about 20k boards per second on my machine, and after about 200k boards the random approach starts to run dry.
Since only one board is actually being evaluated at any given time, the scoring data is stored in global variables. This allows me to minimize the amount of constant data that has to be passed as arguments to the recursive functions. (I'm sure this will give some people hives, and to them I apologize.) The wordlist is stored as a binary search tree. Every word found has to be looked up in the wordlist, so that duplicate words aren't counted twice. The wordlist is only needed during the evaulation process, however, so it's discarded after the score is found. Thus, at the end of the program, the chosen board has to be scored all over again, so that the wordlist can be printed out.
Fun fact: The average score for a randomly-generated Boggle board, as scored by english.0
, is 61.7 points.
2
I would prefer to have a dictionary provided to standardize the objective. Note also that this is not a new idea, as a simple google search will reveal :) The highest score I've seen is
– mellamokb – 2012-04-26T20:22:37.2004527
(1414
total words), found here: http://ai.stanford.edu/~chuongdo/boggle/index.html@mellamokb Sure, the task's not new, but making it as short as possible is the trick.... ;-) – Gaffi – 2012-04-26T20:40:15.017
4Is the program required to terminate this century? – Peter Taylor – 2012-04-26T21:52:34.960
@mellamokb great link. They use a genetic algorithm to randomly find good boards. I would not be surprised if a more intelligent algorithm could find a better board though. – captncraig – 2012-04-26T22:06:46.423
@PeterTaylor Ideally, yes. :-) I had envisioned people coming up with various ways to take shortcuts, rather than iterate through every possible permutation - though anyone wanting (willing) to go that route is free to do so. – Gaffi – 2012-04-26T22:22:47.563
I don't think this is a great idea for a code-golf. Why not just make it a code-challenge? – gnibbler – 2012-04-27T12:01:54.443
@gnibbler If I do that, then what would be the winning criteria? Best Boggle score? AFAIK, there is just one correct answer there, and multiple people could get it. I'm not opposed to your suggestion, but I'm not sure how it would work. – Gaffi – 2012-04-27T12:18:12.043
Sometimes code challenges are made based on the efficiency of the algorithm, i.e., how long it takes the program to run on standardized hardware. – mellamokb – 2012-04-27T13:31:02.607
@mellamokb Is the a public resource for this? I am aware of ideone and codepad, but I don't think these will cut it, will they? I don't have a machine myself that is capable of running tests for all cases. – Gaffi – 2012-04-27T14:39:19.280
@Gaffi: I'm not so sure the maximum point value is currently actually known. The search space is so large that every algorithm I've seen has used AI or some sort of genetic algorithm. It may still be possible to find a larger score. Nevertheless, it doesn't matter what is "known" to be the largest score, because the programs written in this challenge have to find the highest they can at runtime without cheating using prior knowledge, so perhaps the criteria can be the one that generates the largest point value while running in less then 30 seconds, or something like that. – mellamokb – 2012-04-27T19:12:22.723
@mellamokb I suppose I can live with that... I will adapt the question. I think I'll allow for multiple runs (best of 20, let's say) in case the (assumed) randomness of a given algorithm returns a low 'best'. – Gaffi – 2012-04-27T20:10:10.023
What is
Qu
? Is it just a mistake or a place where I should do "Q" and "U" at once? – Konrad Borowski – 2012-04-28T10:31:24.5871@GlitchMr In English, Q is typically only ever used with U. Boggle accounts for this by putting the two letters on the same die as one unit. – Gaffi – 2012-04-28T13:31:33.533
1The word list specification is unclear. Are you counting only those words listed in english.0 in lower case? (Standard word game rules exclude abbreviations/initialisms and proper nouns). – Peter Taylor – 2012-06-08T16:30:05.150
@PeterTaylor In my own code, I take all words (case insensitive) excluding any with
'
(i.e.it's
,ca'nt
). No, I did not specify these constraints, but I agree yours work fairly well. Based on this specific list, how does this sound? All words with: -no punctuation; -a lower-case initial letter. I did not scan the entire file, but these two rules alone cover (exclude) a large amount of words. Do you have another suggestion? – Gaffi – 2012-06-08T16:42:14.0601I was thinking regex
^([a-pr-z]|qu){3,16}$
(which would incorrectly exclude 3-letter words with qu, but there aren't any). – Peter Taylor – 2012-06-08T17:00:57.310@PeterTaylor Works for me... Fixed. – Gaffi – 2012-06-08T18:49:20.703
Since I don't think I'm going to get my current idea working in time, I'll throw it out for anyone who wants to try to beat the deadline. Basically I inverted the problem: first find a really good pair of words (i.e. two words which individually have substrings giving a score of 4+ points per letter) which between them fit in 16 dice, and then try to find a permutation of dice which gets lots of extra words. – Peter Taylor – 2012-06-25T15:15:29.283
Are you familiar with POTM? The problem type and the 2-minute time limit are reminiscent of that contest :) – hobbs – 2012-07-06T17:30:28.160
@hobbs I was not, but I'm looking now. Thanks for the info! – Gaffi – 2012-07-06T17:51:19.880