1P5: Word Changer

20

6

This was written as part of the First Periodic Premier Programming Puzzle Push.

The Game

A starting and ending word of the same length are provided. The objective of the game is to change one letter in the starting word to form a different valid word, repeating this step until the ending word is reached, using the fewest amount of steps. For example, given the words TREE and FLED, the output would be:

TREE
FREE
FLEE
FLED
2

Specifications

  • The Wikipedia articles for the OWL or SOWPODS might be a useful starting point as far as word lists go.
  • The program should support two ways of selecting the start and end words:
    1. Specified by the user via command-line, stdin, or whatever is suitable for your language of choice (just mention what you're doing).
    2. Selecting 2 words at random from the file.
  • The starting and ending words, as well as all interim words should be the same length.
  • Each step should be printed out on its line.
  • The final line of your output should be the number of interim steps that were required to get between the starting and ending words.
  • If a match cannot be found between the starting and ending words, the output should consist of 3 lines: the beginning word, the ending word, and the word OY.
  • Include the Big O Notation for your solution in your answer
  • Please include 10 unique starting and ending words pairs (with their output, of course) to show the steps your program produces. (To save space, while your program should output these on individual lines, you can consolidate these into a single line for posting, replacing new lines with spaces and a comma between each run.

Goals / Winning Criteria

  • The fastest/best Big O solution producing the shortest interim steps after one week will win.
  • If a tie results from the Big O criteria, the shortest code will win.
  • If there is still a tie, the first solution to reach its fastest and shortest revision will win.

Tests / Sample Output

DIVE
DIME
DAME
NAME
2

PEACE
PLACE
PLATE
SLATE
2

HOUSE
HORSE
GORSE
GORGE
2

POLE
POSE
POST
PAST
FAST
3

Validation

I am working on a script that can be used to validate the output.

It will:

  1. Ensure that each word is valid.
  2. Ensure that each word is exactly 1 letter different from the previous word.

It will not:

  1. Check that the shortest number of steps were used.

Once I've gotten that written, I will of course update this post. (:

Rebecca Chernoff

Posted 2011-05-05T18:21:20.110

Reputation: 397

This reduces itself to the Six Degrees of Wikipedia problem. While a heuristic can be developed based upon the Levenshtein Distance of the source and target, the most correct approach for single source, single destination appears to be a breadth-first search based upon the connectivity of all words in, for example, the Oxford English dictionary. Extra credit: because OED is small, run Floyd-Warshall to create a lookup table of all solutions when caching is desired. :)

– MrGomez – 2012-05-16T19:37:58.813

Do the intermediate words have to be the same length as the start and end word? – Peter Taylor – 2011-05-05T18:30:20.480

@PeterTaylor, yes - I'll edit that clarification in. – Rebecca Chernoff – 2011-05-05T18:32:23.800

I would have guessed that if only one letter can be changed per step that that is implied ;) – Joey – 2011-05-05T18:36:24.373

@Joey, often puzzles of this kind allow the change to be a deletion or insertion. – Peter Taylor – 2011-05-05T18:41:28.780

@Rebecca, does your dictionary not have GORSE? HOUSE HORSE GORSE GORGE is optimal. – Peter Taylor – 2011-05-05T18:43:14.317

Peter: I guess optimality will be judged from the word list. – Joey – 2011-05-05T18:45:55.443

@PeterTaylor, that's what I get for copying from a random website I found a few examples from. |: – Rebecca Chernoff – 2011-05-05T18:47:39.837

4It seems strange to me that performing 3 operations to get from HOUSE to GORGE is reported as 2. I realize there are 2 intermediate words, so it does make sense, but # of operations would be more intuitive. – Matthew Read – 2011-05-05T19:05:14.983

@Rebecca, can we assume that the alphabet is A-Z and that the longest words required are no longer than 13 letters? (I don't think it makes much difference to the complexity, which will be dominated by the number of edges in the graph in the worst case, but it allows some convenient hackery). – Peter Taylor – 2011-05-05T19:05:23.597

4@Peter, according to sowpods wikipedia page there are ~15k words longer than 13 letters – gnibbler – 2011-05-06T00:53:39.097

4

I don't mean to be a know it all, but the puzzle actually has a Name, it was invented by Lewis Carroll http://en.wikipedia.org/wiki/Word_ladder

– st0le – 2011-05-06T04:11:10.093

1You have an undecidable goal in the question: The fastest/best Big O solution producing the shortest interim steps after one week will win. Since you can't guarantee, that the fastest solution is meanwhile the one, which uses the fewest steps, you should provide a preference, if one solution uses fewer steps, but reaches the goal later. – user unknown – 2011-05-06T04:24:32.313

2I just want to confirm BAT and CAT will have zero steps, right? – st0le – 2011-05-06T05:24:23.347

@gnibbler, there are longer words but they're boring. There are 9116 14-letter words, so 41546170 pairs of 14-letter words, and only 2258 of those pairs have a solution. If you pick a random pair of 7-letter words the chances of a solution are about 1 in 7. With 8-letter words it's about 1 in 240, and then it gets worse. So I don't think it unreasonable to ask whether we can neglect words longer than a threshold. – Peter Taylor – 2011-05-06T18:56:14.183

If anyone wants a nasty test case with SOWPODS, try ENROBING LINCHETS. – Peter Taylor – 2011-05-06T22:27:58.853

@userunknown, using the shortest path possible is a basic part of the game. – Rebecca Chernoff – 2011-05-07T23:46:43.763

@Rebecca: I see - shortest path is mandatory, to be a solution at all. Thanks. – user unknown – 2011-05-08T00:28:01.557

Answers

9

Since length is listed as a criterion, here's the golfed version at 1681 characters (could probably still be improved by 10%):

import java.io.*;import java.util.*;public class W{public static void main(String[]
a)throws Exception{int n=a.length<1?5:a[0].length(),p,q;String f,t,l;S w=new S();Scanner
s=new Scanner(new
File("sowpods"));while(s.hasNext()){f=s.next();if(f.length()==n)w.add(f);}if(a.length<1){String[]x=w.toArray(new
String[0]);Random
r=new Random();q=x.length;p=r.nextInt(q);q=r.nextInt(q-1);f=x[p];t=x[p>q?q:q+1];}else{f=a[0];t=a[1];}H<S>
A=new H(),B=new H(),C=new H();for(String W:w){A.put(W,new
S());for(p=0;p<n;p++){char[]c=W.toCharArray();c[p]='.';l=new
String(c);A.get(W).add(l);S z=B.get(l);if(z==null)B.put(l,z=new
S());z.add(W);}}for(String W:A.keySet()){C.put(W,w=new S());for(String
L:A.get(W))for(String b:B.get(L))if(b!=W)w.add(b);}N m,o,ñ;H<N> N=new H();N.put(f,m=new
N(f,t));N.put(t,o=new N(t,t));m.k=0;N[]H=new
N[3];H[0]=m;p=H[0].h;while(0<1){if(H[0]==null){if(H[1]==H[2])break;H[0]=H[1];H[1]=H[2];H[2]=null;p++;continue;}if(p>=o.k-1)break;m=H[0];H[0]=m.x();if(H[0]==m)H[0]=null;for(String
v:C.get(m.s)){ñ=N.get(v);if(ñ==null)N.put(v,ñ=new N(v,t));if(m.k+1<ñ.k){if(ñ.k<ñ.I){q=ñ.k+ñ.h-p;N
Ñ=ñ.x();if(H[q]==ñ)H[q]=Ñ==ñ?null:Ñ;}ñ.b=m;ñ.k=m.k+1;q=ñ.k+ñ.h-p;if(H[q]==null)H[q]=ñ;else{ñ.n=H[q];ñ.p=ñ.n.p;ñ.n.p=ñ.p.n=ñ;}}}}if(o.b==null)System.out.println(f+"\n"+t+"\nOY");else{String[]P=new
String[o.k+2];P[o.k+1]=o.k-1+"";m=o;for(q=m.k;q>=0;q--){P[q]=m.s;m=m.b;}for(String
W:P)System.out.println(W);}}}class N{String s;int k,h,I=(1<<30)-1;N b,p,n;N(String S,String
d){s=S;for(k=0;k<d.length();k++)if(d.charAt(k)!=S.charAt(k))h++;k=I;p=n=this;}N
x(){N r=n;n.p=p;p.n=n;n=p=this;return r;}}class S extends HashSet<String>{}class H<V>extends
HashMap<String,V>{}

The ungolfed version, which uses package names and methods and doesn't give warnings or extend classes just to alias them, is:

package com.akshor.pjt33;

import java.io.*;
import java.util.*;

// WordLadder partially golfed and with reduced dependencies
//
// Variables used in complexity analysis:
// n is the word length
// V is the number of words (vertex count of the graph)
// E is the number of edges
// hash is the cost of a hash insert / lookup - I will assume it's constant, but without completely brushing it under the carpet
public class WordLadder2
{
    private Map<String, Set<String>> wordsToWords = new HashMap<String, Set<String>>();

    // Initialisation cost: O(V * n * (n + hash) + E * hash)
    private WordLadder2(Set<String> words)
    {
        Map<String, Set<String>> wordsToLinks = new HashMap<String, Set<String>>();
        Map<String, Set<String>> linksToWords = new HashMap<String, Set<String>>();

        // Cost: O(Vn * (n + hash))
        for (String word : words)
        {
            // Cost: O(n*(n + hash))
            for (int i = 0; i < word.length(); i++)
            {
                // Cost: O(n + hash)
                char[] ch = word.toCharArray();
                ch[i] = '.';
                String link = new String(ch).intern();
                add(wordsToLinks, word, link);
                add(linksToWords, link, word);
            }
        }

        // Cost: O(V * n * hash + E * hash)
        for (Map.Entry<String, Set<String>> from : wordsToLinks.entrySet()) {
            String src = from.getKey();
            wordsToWords.put(src, new HashSet<String>());
            for (String link : from.getValue()) {
                Set<String> to = linksToWords.get(link);
                for (String snk : to) {
                    // Note: equality test is safe here. Cost is O(hash)
                    if (snk != src) add(wordsToWords, src, snk);
                }
            }
        }
    }

    public static void main(String[] args) throws IOException
    {
        // Cost: O(filelength + num_words * hash)
        Map<Integer, Set<String>> wordsByLength = new HashMap<Integer, Set<String>>();
        BufferedReader br = new BufferedReader(new FileReader("sowpods"), 8192);
        String line;
        while ((line = br.readLine()) != null) add(wordsByLength, line.length(), line);

        if (args.length == 2) {
            String from = args[0].toUpperCase();
            String to = args[1].toUpperCase();
            new WordLadder2(wordsByLength.get(from.length())).findPath(from, to);
        }
        else {
            // 5-letter words are the most interesting.
            String[] _5 = wordsByLength.get(5).toArray(new String[0]);
            Random rnd = new Random();
            int f = rnd.nextInt(_5.length), g = rnd.nextInt(_5.length - 1);
            if (g >= f) g++;
            new WordLadder2(wordsByLength.get(5)).findPath(_5[f], _5[g]);
        }
    }

    // O(E * hash)
    private void findPath(String start, String dest) {
        Node startNode = new Node(start, dest);
        startNode.cost = 0; startNode.backpointer = startNode;

        Node endNode = new Node(dest, dest);

        // Node lookup
        Map<String, Node> nodes = new HashMap<String, Node>();
        nodes.put(start, startNode);
        nodes.put(dest, endNode);

        // Heap
        Node[] heap = new Node[3];
        heap[0] = startNode;
        int base = heap[0].heuristic;

        // O(E * hash)
        while (true) {
            if (heap[0] == null) {
                if (heap[1] == heap[2]) break;
                heap[0] = heap[1]; heap[1] = heap[2]; heap[2] = null; base++;
                continue;
            }

            // If the lowest cost isn't at least 1 less than the current cost for the destination,
            // it can't improve the best path to the destination.
            if (base >= endNode.cost - 1) break;

            // Get the cheapest node from the heap.
            Node v0 = heap[0];
            heap[0] = v0.remove();
            if (heap[0] == v0) heap[0] = null;

            // Relax the edges from v0.
            int g_v0 = v0.cost;
            // O(hash * #neighbours)
            for (String v1Str : wordsToWords.get(v0.key))
            {
                Node v1 = nodes.get(v1Str);
                if (v1 == null) {
                    v1 = new Node(v1Str, dest);
                    nodes.put(v1Str, v1);
                }

                // If it's an improvement, use it.
                if (g_v0 + 1 < v1.cost)
                {
                    // Update the heap.
                    if (v1.cost < Node.INFINITY)
                    {
                        int bucket = v1.cost + v1.heuristic - base;
                        Node t = v1.remove();
                        if (heap[bucket] == v1) heap[bucket] = t == v1 ? null : t;
                    }

                    // Next update the backpointer and the costs map.
                    v1.backpointer = v0;
                    v1.cost = g_v0 + 1;

                    int bucket = v1.cost + v1.heuristic - base;
                    if (heap[bucket] == null) {
                        heap[bucket] = v1;
                    }
                    else {
                        v1.next = heap[bucket];
                        v1.prev = v1.next.prev;
                        v1.next.prev = v1.prev.next = v1;
                    }
                }
            }
        }

        if (endNode.backpointer == null) {
            System.out.println(start);
            System.out.println(dest);
            System.out.println("OY");
        }
        else {
            String[] path = new String[endNode.cost + 1];
            Node t = endNode;
            for (int i = t.cost; i >= 0; i--) {
                path[i] = t.key;
                t = t.backpointer;
            }
            for (String str : path) System.out.println(str);
            System.out.println(path.length - 2);
        }
    }

    private static <K, V> void add(Map<K, Set<V>> map, K key, V value) {
        Set<V> vals = map.get(key);
        if (vals == null) map.put(key, vals = new HashSet<V>());
        vals.add(value);
    }

    private static class Node
    {
        public static int INFINITY = Integer.MAX_VALUE >> 1;

        public String key;
        public int cost;
        public int heuristic;
        public Node backpointer;

        public Node prev = this;
        public Node next = this;

        public Node(String key, String dest) {
            this.key = key;
            cost = INFINITY;
            for (int i = 0; i < dest.length(); i++) if (dest.charAt(i) != key.charAt(i)) heuristic++;
        }

        public Node remove() {
            Node rv = next;
            next.prev = prev;
            prev.next = next;
            next = prev = this;
            return rv;
        }
    }
}

As you can see, the running cost analysis is O(filelength + num_words * hash + V * n * (n + hash) + E * hash). If you'll accept my assumption that a hash table insertion/lookup is constant time, that's O(filelength + V n^2 + E). The particular statistics of the graphs in SOWPODS mean that O(V n^2) really dominates O(E) for most n.

Sample outputs:

IDOLA, IDOLS, IDYLS, ODYLS, ODALS, OVALS, OVELS, OVENS, EVENS, ETENS, STENS, SKENS, SKINS, SPINS, SPINE, 13

WICCA, PROSY, OY

BRINY, BRINS, TRINS, TAINS, TARNS, YARNS, YAWNS, YAWPS, YAPPS, 7

GALES, GASES, GASTS, GESTS, GESTE, GESSE, DESSE, 5

SURES, DURES, DUNES, DINES, DINGS, DINGY, 4

LICHT, LIGHT, BIGHT, BIGOT, BIGOS, BIROS, GIROS, GIRNS, GURNS, GUANS, GUANA, RUANA, 10

SARGE, SERGE, SERRE, SERRS, SEERS, DEERS, DYERS, OYERS, OVERS, OVELS, OVALS, ODALS, ODYLS, IDYLS, 12

KEIRS, SEIRS, SEERS, BEERS, BRERS, BRERE, BREME, CREME, CREPE, 7

This is one of the 6 pairs with the longest shortest path:

GAINEST, FAINEST, FAIREST, SAIREST, SAIDEST, SADDEST, MADDEST, MIDDEST, MILDEST, WILDEST, WILIEST, WALIEST, WANIEST, CANIEST, CANTEST, CONTEST, CONFEST, CONFESS, CONFERS, CONKERS, COOKERS, COOPERS, COPPERS, POPPERS, POPPETS, POPPITS, POPPIES, POPSIES, MOPSIES, MOUSIES, MOUSSES, POUSSES, PLUSSES, PLISSES, PRISSES, PRESSES, PREASES, UREASES, UNEASES, UNCASES, UNCASED, UNBASED, UNBATED, UNMATED, UNMETED, UNMEWED, ENMEWED, ENDEWED, INDEWED, INDEXED, INDEXES, INDENES, INDENTS, INCENTS, INCESTS, INFESTS, INFECTS, INJECTS, 56

And one of the worst-case soluble 8-letter pairs:

ENROBING, UNROBING, UNROPING, UNCOPING, UNCAPING, UNCAGING, ENCAGING, ENRAGING, ENRACING, ENLACING, UNLACING, UNLAYING, UPLAYING, SPLAYING, SPRAYING, STRAYING, STROYING, STROKING, STOOKING, STOOPING, STOMPING, STUMPING, SLUMPING, CLUMPING, CRUMPING, CRIMPING, CRISPING, CRISPINS, CRISPENS, CRISPERS, CRIMPERS, CRAMPERS, CLAMPERS, CLASPERS, CLASHERS, SLASHERS, SLATHERS, SLITHERS, SMITHERS, SMOTHERS, SOOTHERS, SOUTHERS, MOUTHERS, MOUCHERS, COUCHERS, COACHERS, POACHERS, POTCHERS, PUTCHERS, PUNCHERS, LUNCHERS, LYNCHERS, LYNCHETS, LINCHETS, 52

Now that I think I've got all the requirements of the question out of the way, my discussion.

For a CompSci the question obviously reduces to shortest path in a graph G whose vertices are words and whose edges connect words differing in one letter. Generating the graph efficiently isn't trivial - I actually have an idea I need to revisit to reduce the complexity to O(V n hash + E). The way I do it involves creating a graph which inserts extra vertices (corresponding to words with one wildcard character) and is homeomorphic to the graph in question. I did consider using that graph rather than reducing to G - and I suppose that from a golfing point of view I should have done - on the basis that a wildcard node with more than 3 edges reduces the number of edges in the graph, and the standard worst case running time of shortest path algorithms is O(V heap-op + E).

However, the first thing I did was to run some analyses of the graphs G for different word lengths, and I discovered that they're extremely sparse for words of 5 or more letters. The 5-letter graph has 12478 vertices and 40759 edges; adding link nodes makes the graph worse. By the time you're up to 8 letters there are fewer edges than nodes, and 3/7 of the words are "aloof". So I rejected that optimisation idea as not really helpful.

The idea which did prove helpful was to examine the heap. I can honestly say that I've implemented some moderately exotic heaps in the past, but none as exotic as this. I use A-star (since C provides no benefit given the heap I'm using) with the obvious heuristic of number of letters different from the target, and a bit of analysis shows that at any time there are no more than 3 different priorities in the heap. When I pop a node whose priority is (cost + heuristic) and look at its neighbours, there are three cases I'm considering: 1) neighbour's cost is cost+1; neighbour's heuristic is heuristic-1 (because the letter it changes becomes "correct"); 2) cost+1 and heuristic+0 (because the letter it changes goes from "wrong" to "still wrong"; 3) cost+1 and heuristic+1 (because the letter it changes goes from "correct" to "wrong"). So if I relax the neighbour I'm going to insert it at the same priority, priority+1, or priority+2. As a result I can use a 3-element array of linked lists for the heap.

I should add a note about my assumption that hash lookups are constant. Very well, you may say, but what about hash computations? The answer is that I'm amortising them away: java.lang.String caches its hashCode(), so the total time spent computing hashes is O(V n^2) (in generating the graph).

There is another change which affects the complexity, but the question of whether it is an optimisation or not depends on your assumptions about statistics. (IMO putting "the best Big O solution" as a criterion is a mistake because there isn't a best complexity, for a simple reason: there isn't a single variable). This change affects the graph generation step. In the code above, it's:

        Map<String, Set<String>> wordsToLinks = new HashMap<String, Set<String>>();
        Map<String, Set<String>> linksToWords = new HashMap<String, Set<String>>();

        // Cost: O(Vn * (n + hash))
        for (String word : words)
        {
            // Cost: O(n*(n + hash))
            for (int i = 0; i < word.length(); i++)
            {
                // Cost: O(n + hash)
                char[] ch = word.toCharArray();
                ch[i] = '.';
                String link = new String(ch).intern();
                add(wordsToLinks, word, link);
                add(linksToWords, link, word);
            }
        }

        // Cost: O(V * n * hash + E * hash)
        for (Map.Entry<String, Set<String>> from : wordsToLinks.entrySet()) {
            String src = from.getKey();
            wordsToWords.put(src, new HashSet<String>());
            for (String link : from.getValue()) {
                Set<String> to = linksToWords.get(link);
                for (String snk : to) {
                    // Note: equality test is safe here. Cost is O(hash)
                    if (snk != src) add(wordsToWords, src, snk);
                }
            }
        }

That's O(V * n * (n + hash) + E * hash). But the O(V * n^2) part comes from generating a new n-character string for each link and then computing its hashcode. This can be avoided with a helper class:

    private static class Link
    {
        private String str;
        private int hash;
        private int missingIdx;

        public Link(String str, int hash, int missingIdx) {
            this.str = str;
            this.hash = hash;
            this.missingIdx = missingIdx;
        }

        @Override
        public int hashCode() { return hash; }

        @Override
        public boolean equals(Object obj) {
            Link l = (Link)obj; // Unsafe, but I know the contexts where I'm using this class...
            if (this == l) return true; // Essential
            if (hash != l.hash || missingIdx != l.missingIdx) return false;
            for (int i = 0; i < str.length(); i++) {
                if (i != missingIdx && str.charAt(i) != l.str.charAt(i)) return false;
            }
            return true;
        }
    }

Then the first half of the graph generation becomes

        Map<String, Set<Link>> wordsToLinks = new HashMap<String, Set<Link>>();
        Map<Link, Set<String>> linksToWords = new HashMap<Link, Set<String>>();

        // Cost: O(V * n * hash)
        for (String word : words)
        {
            // apidoc: The hash code for a String object is computed as
            // s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
            // Cost: O(n * hash)
            int hashCode = word.hashCode();
            int pow = 1;
            for (int j = word.length() - 1; j >= 0; j--) {
                Link link = new Link(word, hashCode - word.charAt(j) * pow, j);
                add(wordsToLinks, word, link);
                add(linksToWords, link, word);
                pow *= 31;
            }
        }

By using the structure of the hashcode we can generate the links in O(V * n). However, this has a knock-on effect. Inherent in my assumption that hash lookups are constant time is an assumption that comparing objects for equality is cheap. However, the equality test of Link is O(n) in the worst case. The worst case is when we have a hash collision between two equal links generated from different words - i.e. it occurs O(E) times in the second half of the graph generation. Other than that, except in the unlikely event of a hash collision between non-equal links, we're good. So we've traded in O(V * n^2) for O(E * n * hash). See my previous point about statistics.

Peter Taylor

Posted 2011-05-05T18:21:20.110

Reputation: 41 901

I believe 8192 is the default buffer size for BufferedReader (on SunVM) – st0le – 2011-05-09T04:04:19.470

@st0le, I omitted that parameter in the golfed version, and it doesn't harm in the ungolfed one. – Peter Taylor – 2011-05-09T21:37:09.173

5

Java

Complexity : ?? (I don't have a CompSci Degree, so i'd appreciate help in this matter.)

Input : Provide Pair of words (more than 1 pair if you wish) in the command line. If no Command line is specified 2 distinct random words are chosen.

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.util.HashMap;
import java.util.HashSet;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.Random;
import java.util.Set;

public class M {

    // for memoization
    private static Map<String, List<String>> memoEdits = new HashMap<String, List<String>>(); 
    private static Set<String> dict;

    private static List<String> edits(String word, Set<String> dict) {
        if(memoEdits.containsKey(word))
            return memoEdits.get(word);

        List<String> editsList = new LinkedList<String>();
        char[] letters = word.toCharArray();
        for(int i = 0; i < letters.length; i++) {
            char hold = letters[i];
            for(char ch = 'A'; ch <= 'Z'; ch++) {
                if(ch != hold) {
                    letters[i] = ch;
                    String nWord = new String(letters);
                    if(dict.contains(nWord)) {
                        editsList.add(nWord);
                    }
                }
            }
            letters[i] = hold;
        }
        memoEdits.put(word, editsList);
        return editsList;
    }

    private static Map<String, String> bfs(String wordFrom, String wordTo,
                                           Set<String> dict) {
        Set<String> visited = new HashSet<String>();
        List<String> queue = new LinkedList<String>();
        Map<String, String> pred = new HashMap<String, String>();
        queue.add(wordFrom);
        while(!queue.isEmpty()) {
            String word = queue.remove(0);
            if(word.equals(wordTo))
                break;

            for(String nWord: edits(word, dict)) {
                if(!visited.contains(nWord)) {
                    queue.add(nWord);
                    visited.add(nWord);
                    pred.put(nWord, word);
                }
            }
        }
        return pred;
    }

    public static void printPath(String wordTo, String wordFrom) {
        int c = 0;
        Map<String, String> pred = bfs(wordFrom, wordTo, dict);
        do {
            System.out.println(wordTo);
            c++;
            wordTo = pred.get(wordTo);
        }
        while(wordTo != null && !wordFrom.equals(wordTo));
        System.out.println(wordFrom);
        if(wordTo != null)
            System.out.println(c - 1);
        else
            System.out.println("OY");
        System.out.println();
    }

    public static void main(String[] args) throws Exception {
        BufferedReader scan = new BufferedReader(new FileReader(new File("c:\\332609\\dict.txt")),
                                                 40 * 1024);
        String line;
        dict = new HashSet<String>(); //the dictionary (1 word per line)
        while((line = scan.readLine()) != null) {
            dict.add(line);
        }
        scan.close();
        if(args.length == 0) { // No Command line Arguments? Pick 2 random
                               // words.
            Random r = new Random(System.currentTimeMillis());
            String[] words = dict.toArray(new String[dict.size()]);
            int x = r.nextInt(words.length), y = r.nextInt(words.length);
            while(x == y) //same word? that's not fun...
                y = r.nextInt(words.length);
            printPath(words[x], words[y]);
        }
        else { // Arguments provided, search for path pairwise
            for(int i = 0; i < args.length; i += 2) {
                if(i + 1 < args.length)
                    printPath(args[i], args[i + 1]);
            }
        }
    }
}

st0le

Posted 2011-05-05T18:21:20.110

Reputation: 2 002

I've used Memoization, for quicker results. The Dictionary path is hardcoded. – st0le – 2011-05-06T05:31:11.110

@Joey, it used to be but not any more. It now has a static field which it increments each time and adds to System.nanoTime(). – Peter Taylor – 2011-05-06T05:46:03.897

@Joey, aah, ok, but i'll leave it for now, don't want to increment my revisions :P – st0le – 2011-05-06T05:46:21.333

oh, btw, i'm at work and those scrabble websites are apparently blocked so i don't have access to the dictionaries...will generate those 10 unique words best by tomorrow morning. Cheers! – st0le – 2011-05-06T07:08:46.910

You could reduce the (computational) complexity by doing a bidirectional bfs, i.e. search from both sides and stop when you encounter a node visited from the other side. – Nabb – 2011-05-06T08:16:32.520

@Nabb, will look into it, Actually, I already had this program in my archive just modified to suit the OP's requirements. – st0le – 2011-05-06T08:37:24.320

@Nabb, does that reduce the complexity? Surely the worst case requires visiting all edges. – Peter Taylor – 2011-05-06T22:30:31.483

@Peter: Depends on how you look at it. It's possible to even construct cases where a bidirectional bfs is inferior (e.g. S--x--x--x--F--x). If we consider the number of edges from edge node to be a constant C, then I think we expect about C^H(S,F) for single direction bfs and 2*C^(H(S,F)/2) for bidirectional, with H(S,F) being the hamming distance between S and F. – Nabb – 2011-05-07T01:03:30.623

@Nabb, the worst case is when there is no solution. I agree that bidirectional search done well can improve the average running time for some graphs (probably up to 5-letter words, given the actual statistics). – Peter Taylor – 2011-05-07T08:42:53.850

3

c on unix

Using dijkstra algorithm.

A large fraction of the code is a costume n-ary tree implementation, which serves to hold

  • The wordlist (thus minimizing the number of times the input file is read (twice for no arguments, once for other cases) on the assumption that file IO is slow
  • The partial trees as we build them.
  • The final path.

Anyone interested in seeing how it works should probably read findPath, process, and processOne (and their associated comments). And maybe buildPath and buildPartialPath. The rest is bookkeeping and scaffolding. Several routines used during testing and development but not in the "production" version have been left in place.

I'm using /usr/share/dict/words on my Mac OS 10.5 box, which has so many long, esoteric entries that letting it run completely at random generates a lot of OYs.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <getline.h>
#include <time.h>
#include <unistd.h>
#include <ctype.h>

const char*wordfile="/usr/share/dict/words";
/* const char*wordfile="./testwords.txt"; */
const long double RANDOM_MAX = (2LL<<31)-1;

typedef struct node_t {
  char*word;
  struct node_t*kids;
  struct node_t*next;
} node;


/* Return a pointer to a newly allocated node. If word is non-NULL, 
 * call setWordNode;
 */
node*newNode(char*word){
  node*n=malloc(sizeof(node));
  n->word=NULL;
  n->kids=NULL;
  n->next=NULL;
  if (word) n->word = strdup(word);
  return n;
}
/* We can use the "next" links to treat these as a simple linked list,
 * and further can make it a stack or queue by
 *
 * * pop()/deQueu() from the head
 * * push() onto the head
 * * enQueue at the back
 */
void push(node*n, node**list){
  if (list==NULL){
    fprintf(stderr,"Active operation on a NULL list! Exiting\n");
    exit(5);
  }
  n->next = (*list);
  (*list) = n;
}
void enQueue(node*n, node**list){
  if (list==NULL){
    fprintf(stderr,"Active operation on a NULL list! Exiting\n");
    exit(5);
  }
  if ( *list==NULL ) {
    *list=n;
  } else {
    enQueue(n,&((*list)->next));
  }
}
node*pop(node**list){
  node*temp=NULL;
  if (list==NULL){
    fprintf(stderr,"Active operation on a NULL list! Exiting\n");
    exit(5);
  }
  temp = *list;
  if (temp != NULL) {
    (*list) = temp->next;
    temp->next=NULL;
  }
  return temp;
}
node*deQueue(node**list){ /* Alias for pop */
  return pop(list);
}

/* return a pointer to a node in tree matching word or NULL if none */
node* isInTree(char*word, node*tree){
  node*isInNext=NULL;
  node*isInKids=NULL;
  if (tree==NULL || word==NULL) return NULL;
  if (tree->word && (0 == strcasecmp(word,tree->word))) return tree;
  /* prefer to find the target at shallow levels so check the siblings
     before the kids */
  if (tree->next && (isInNext=isInTree(word,tree->next))) return isInNext;
  if (tree->kids && (isInKids=isInTree(word,tree->kids))) return isInKids;
  return NULL;
}

node* freeTree(node*t){
  if (t==NULL) return NULL;
  if (t->word) {free(t->word); t->word=NULL;}
  if (t->next) t->next=freeTree(t->next);
  if (t->kids) t->kids=freeTree(t->kids);
  free(t);
  return NULL;
}

void printTree(node*t, int indent){
  int i;
  if (t==NULL) return;
  for (i=0; i<indent; i++) printf("\t"); printf("%s\n",t->word);
  printTree(t->kids,indent+1);
  printTree(t->next,indent);
}

/* count the letters of difference between two strings */
int countDiff(const char*w1, const char*w2){
  int count=0;
  if (w1==NULL || w2==NULL) return -1;
  while ( (*w1)!='\0' && (*w2)!='\0' ) {
    if ( (*w1)!=(*w2) ) count++;
    w1++;
    w2++;
  }
  return count;
}

node*buildPartialPath(char*stop, node*tree){
  node*list=NULL;
  while ( (tree != NULL) && 
      (tree->word != NULL) && 
      (0 != strcasecmp(tree->word,stop)) ) {
    node*kid=tree->kids;
    node*newN = newNode(tree->word);
    push(newN,&list);
    newN=NULL;
    /* walk over all all kids not leading to stop */
    while ( kid && 
        (strcasecmp(kid->word,stop)!=0) &&
        !isInTree(stop,kid->kids) ) {
      kid=kid->next;
    }
    if (kid==NULL) {
      /* Assuming a preconditions where isInTree(stop,tree), we should
       * not be able to get here...
       */
      fprintf(stderr,"Unpossible!\n");
      exit(7);
    } 
    /* Here we've found a node that either *is* the target or leads to it */
    if (strcasecmp(stop,kid->word) == 0) {
      break;
    }
    tree = kid;
  }
  return list; 
}
/* build a node list path 
 *
 * We can walk down each tree, identfying nodes as we go
 */
node*buildPath(char*pivot,node*frontTree,node*backTree){
  node*front=buildPartialPath(pivot,frontTree);
  node*back=buildPartialPath(pivot,backTree);
  /* weld them together with pivot in between 
  *
  * The front list is in reverse order, the back list in order
  */
  node*thePath=NULL;
  while (front != NULL) {
    node*n=pop(&front);
    push(n,&thePath);
  }
  if (pivot != NULL) {
    node*n=newNode(pivot);
    enQueue(n,&thePath);
  }
  while (back != NULL) {
    node*n=pop(&back);
    enQueue(n,&thePath);
  }
  return thePath;
}

/* Add new child nodes to the single node in ts named by word. Also
 * queue these new word in q
 * 
 * Find node N matching word in ts
 * For tword in wordList
 *    if (tword is one change from word) AND (tword not in ts)
 *        add tword to N.kids
 *        add tword to q
 *        if tword in to
 *           return tword
 * return NULL
 */
char* processOne(char *word, node**q, node**ts, node**to, node*wordList){
  if ( word==NULL || q==NULL || ts==NULL || to==NULL || wordList==NULL ) {
    fprintf(stderr,"ProcessOne called with NULL argument! Exiting.\n");
    exit(9);
  }
  char*result=NULL;
  /* There should be a node in ts matching the leading node of q, find it */
  node*here = isInTree(word,*ts);
  /* Now test each word in the list as a possible child of HERE */
  while (wordList != NULL) {
    char *tword=wordList->word;
    if ((1==countDiff(word,tword)) && !isInTree(tword,*ts)) {
      /* Queue this up as a child AND for further processing */
      node*newN=newNode(tword);
      enQueue(newN,&(here->kids));
      newN=newNode(tword);
      enQueue(newN,q);
      /* This might be our pivot */
      if ( isInTree(tword,*to) ) {
    /* we have found a node that is in both trees */
    result=strdup(tword);
    return result;
      }
    }
    wordList=wordList->next;
  }
  return result;
}

/* Add new child nodes to ts for all the words in q */
char* process(node**q, node**ts, node**to, node*wordList){
  node*tq=NULL;
  char*pivot=NULL;
  if ( q==NULL || ts==NULL || to==NULL || wordList==NULL ) {
    fprintf(stderr,"Process called with NULL argument! Exiting.\n");
    exit(9);
  }
  while (*q && (pivot=processOne((*q)->word,&tq,ts,to,wordList))==NULL) {
    freeTree(deQueue(q));
  }
  freeTree(*q); 
  *q=tq;
  return pivot;
}

/* Find a path between w1 and w2 using wordList by dijkstra's
 * algorithm
 *
 * Use a breadth-first extensions of the trees alternating between
 * trees.
 */
node* findPath(char*w1, char*w2, node*wordList){
  node*thePath=NULL; /* our resulting path */
  char*pivot=NULL; /* The node we find that matches */
  /* trees of existing nodes */
  node*t1=newNode(w1); 
  node*t2=newNode(w2);
  /* queues of nodes to work on */
  node*q1=newNode(w1);
  node*q2=newNode(w2);

  /* work each queue all the way through alternating until a word is
     found in both lists */
  while( (q1!=NULL) && ((pivot = process(&q1,&t1,&t2,wordList)) == NULL) &&
     (q2!=NULL) && ((pivot = process(&q2,&t2,&t1,wordList)) == NULL) )
    /* no loop body */ ;


  /* one way or another we are done with the queues here */
  q1=freeTree(q1);
  q2=freeTree(q2);
  /* now construct the path */
  if (pivot!=NULL) thePath=buildPath(pivot,t1,t2);
  /* clean up after ourselves */
  t1=freeTree(t1);
  t2=freeTree(t2);

  return thePath;
}

/* Convert a non-const string to UPPERCASE in place */
void upcase(char *s){
  while (s && *s) {
    *s = toupper(*s);
    s++;
  }
}

/* Walks the input file stuffing lines of the given length into a list */
node*getListWithLength(const char*fname, int len){
  int l=-1;
  size_t n=0;
  node*list=NULL;
  char *line=NULL;
  /* open the word file */
  FILE*f = fopen(fname,"r");
  if (NULL==f){
    fprintf(stderr,"Could not open word file '%s'. Exiting.\n",fname);
    exit(3);
  }
  /* walk the file, trying each word in turn */
  while ( !feof(f) && ((l = getline(&line,&n,f)) != -1) ) {
    /* strip trailing whitespace */
    char*temp=line;
    strsep(&temp," \t\n");
    if (strlen(line) == len) {
      node*newN = newNode(line);
      upcase(newN->word);
      push(newN,&list);
    }
  }
  fclose(f);
  return list;
}

/* Assumes that filename points to a file containing exactly one
 * word per line with no other whitespace.
 * It will return a randomly selected word from filename.
 *
 * If veto is non-NULL, only non-matching words of the same length
 * wll be considered.
 */
char*getRandomWordFile(const char*fname, const char*veto){
  int l=-1, count=1;
  size_t n=0;
  char *word=NULL;
  char *line=NULL;
  /* open the word file */
  FILE*f = fopen(fname,"r");
  if (NULL==f){
    fprintf(stderr,"Could not open word file '%s'. Exiting.\n",fname);
    exit(3);
  }
  /* walk the file, trying each word in turn */
  while ( !feof(f) && ((l = getline(&line,&n,f)) != -1) ) {
    /* strip trailing whitespace */
    char*temp=line;
    strsep(&temp," \t\n");
    if (strlen(line) < 2) continue; /* Single letters are too easy! */
    if ( (veto==NULL) || /* no veto means chose from all */ 
     ( 
      ( strlen(line) == strlen(veto) )  && /* veto means match length */
      ( 0 != strcasecmp(veto,line) )       /* but don't match word */ 
       ) ) { 
      /* This word is worthy of consideration. Select it with random
         chance (1/count) then increment count */
      if ( (word==NULL) || (random() < RANDOM_MAX/count) ) {
    if (word) free(word);
    word=strdup(line);
      }
      count++;
    }
  }
  fclose(f);
  upcase(word);
  return word;
}

void usage(int argc, char**argv){
  fprintf(stderr,"%s [ <startWord> [ <endWord> ]]:\n\n",argv[0]);
  fprintf(stderr,
      "\tFind the shortest transformation from one word to another\n");
  fprintf(stderr,
      "\tchanging only one letter at a time and always maintaining a\n");
  fprintf(stderr,
      "\tword that exists in the word file.\n\n");
  fprintf(stderr,
      "\tIf startWord is not passed, chose at random from '%s'\n",
      wordfile);
  fprintf(stderr,
      "\tIf endWord is not passed, chose at random from '%s'\n",
      wordfile);
  fprintf(stderr,
      "\tconsistent with the length of startWord\n");
  exit(2);
}

int main(int argc, char**argv){
  char *startWord=NULL;
  char *endWord=NULL;

  /* intialize OS services */
  srandom(time(0)+getpid());
  /* process command line */
  switch (argc) {
  case 3:
    endWord = strdup(argv[2]);
    upcase(endWord);
  case 2:
    startWord = strdup(argv[1]);
    upcase(startWord);
  case 1:
    if (NULL==startWord) startWord = getRandomWordFile(wordfile,NULL);
    if (NULL==endWord)   endWord   = getRandomWordFile(wordfile,startWord);
    break;
  default:
    usage(argc,argv);
    break;
  }
  /* need to check this in case the user screwed up */
  if ( !startWord || ! endWord || strlen(startWord) != strlen(endWord) ) {
    fprintf(stderr,"Words '%s' and '%s' are not the same length! Exiting\n",
        startWord,endWord);
    exit(1);
  }
  /* Get a list of all the words having the right length */
  node*wordList=getListWithLength(wordfile,strlen(startWord));
  /* Launch into the path finder*/
  node *theList=findPath(startWord,endWord,wordList);
  /* Print the resulting path */
  if (theList) {
    int count=-2;
    while (theList) {
      printf("%s\n",theList->word);
      theList=theList->next;
      count++;
    }
    printf("%d\n",count);
  } else {
    /* No path found case */
    printf("%s %s OY\n",startWord,endWord);
  }
  return 0;
}

Some output:

$ ./changeword dive name
DIVE
DIME
DAME
NAME
2
$ ./changeword house gorge
HOUSE
HORSE
GORSE
GORGE
2
$ ./changeword stop read
STOP
STEP
SEEP
SEED
REED
READ
4
$ ./changeword peace slate
PEACE
PLACE
PLATE
SLATE
2
$ ./changeword pole fast  
POLE
POSE
POST
PAST
FAST
3
$ ./changeword          
QUINTIPED LINEARITY OY
$ ./changeword sneaky   
SNEAKY WAXILY OY
$ ./changeword TRICKY
TRICKY
PRICKY
PRINKY
PRANKY
TRANKY
TWANKY
SWANKY
SWANNY
SHANNY
SHANTY
SCANTY
SCATTY
SCOTTY
SPOTTY
SPOUTY
STOUTY
STOUTH
STOUSH
SLOUSH
SLOOSH
SWOOSH
19
$ ./changeword router outlet
ROUTER
ROTTER
RUTTER
RUTHER
OUTHER
OUTLER
OUTLET
5
$ ./changeword 
IDIOM
IDISM
IDIST
ODIST
OVIST
OVEST
OVERT
AVERT
APERT
APART
SPART
SPARY
SEARY
DEARY
DECRY
DECAY
DECAN
DEDAN
SEDAN
17

Complexity analysis is non-trivial. The search is a two-sided, iterative deepening.

  • For each node examined I walk the entire word list (though limited to words of the right length). Call the length of the list W.
  • The minimum number of steps is S_min = (<number of different letter>-1) because if we are only one letter apart we score the change at 0 intermediate steps. The maximum is hard to quantify see the TRICKY--SWOOSH run above. Each half of the tree will be S/2-1 to S/2
  • I have not done an analysis of the branching behavior of the tree, but call it B.

So the minimum number of operations is around 2 * (S/2)^B * W, not really good.

dmckee --- ex-moderator kitten

Posted 2011-05-05T18:21:20.110

Reputation: 2 726

Maybe this is naive of me, but I don't see anything in your design or implementation that requires edge weights. While Dijkstra's indeed works for unweighted graphs (edge weight is invariably "1"), wouldn't a simple breadth-first search apply here to improve your bounds to O(|V|+|E|) instead of O(|E|+|V| log |V|)? – MrGomez – 2012-05-16T19:30:11.367