Let's play Hangman?

8

4

According to this page, the best strategy to guess English hangman words is to calculate the odds of each letter in a word list that meet our conditions. But, as I'm really lazy, I don't really want to compute every word in the dictionary by myself. But, as I know that you are always here to help me, I'm sure you will be able to make me some king of a code that will do that for me. And, because my hard disk is nearly full, I'd like the smallest possible code. It means that this is code-golf, and the submission with the lowest amount of bytes will win, but also the most accurate one !.

Input/Output

A random word from this wordlist will be taken.

Your program should accept, in arguments, or by user input (popups, stdin, whatever),

  • Length of the word
  • Incorrect letter already found, or 0 if we just started the game, and you provided no incorrect letter.
  • Letters already found AND their position in the word

Example : ./hangsolver 6 XBZ 1P 4P 2E 6E Here, I picked the word "people". For clarity : P E _ P _ E (Wrong letters are X B and Z)

That means, in one game, I'll have to launch your script many times!

The output will be a single letter, your next try.

Rules

  • The one that will guess 10 words in less tries than the others will win.
  • In case of a tie, shortest code in bytes win.
  • If there is still a tie, the fastest program will win.
  • You may assume that there are only these words in the English language
  • I'll only try valid words from the wordlist.
  • I've got a good computer, CPU power won't be a problem ( but try to answer as fast as you can !)
  • You can't solve with an online solver, but you can download the wordlist, or have it passed as an argument. You can assume it will be named "wordlist.txt" and in the same directory as your script.
  • Your code must be able to run on a common OS. It could be windows, mac, or ubuntu/debian/CentOS or Redhat.
  • You can't use an external solver.
  • You can however shorten the URL to the wordlist.
  • This code-golf will end the first of September.
  • You MUST use the method described above.

Good luck !

Wordlist found here on SE.

WayToDoor

Posted 2015-08-17T09:37:18.220

Reputation: 459

2Shortest code in bytes will win, with best guess being the tiebreak? That means that my program that simply guesses any random letter that hasn't been used before will beat someone elses that actually tries to work out a good guess. You may want to rethink your scoring, or you are going to get trivial answers. – Level River St – 2015-08-17T09:46:31.070

"You MUST use the method described above.", I quote the rules. But I'll edit to make that the primary winning criteria – WayToDoor – 2015-08-17T09:47:48.947

1To be clear, "using the method described above" means guess the letter that appears in the largest number of the possible words that has not been guessed yet? – isaacg – 2015-08-17T09:49:48.307

Yes, exactly. Thanks for the typo edit ! – WayToDoor – 2015-08-17T09:50:50.320

1Shouldn't the input data in your example be "6 XBZ 1P 4P 2E 6E"? – Razvan – 2015-08-17T10:44:39.420

Oh yeah, sorry. I'll edit. – WayToDoor – 2015-08-17T11:10:25.897

You're right. Edited – WayToDoor – 2015-08-17T20:42:45.933

Answers

5

PowerShell, 248 246 241 bytes

$a=$args;$c=$a[0];$b=$a[2..$c];((gc wordlist.txt)-match"^$((1..$c|%{$e=,"[^$($a[1])]"*$c}{$e[$_-1]=(((@($b)-match"^$_\D")[0]-split$_)[-1],$e[$_-1]-ne'')[0]}{$e-join''}))$"-join''-split'\B'|?{!(@($b)-match$_).Count}|group|sort count)[-1].Name

Ungolfed

Well, as much as I could without changing the way it works:

$a=$args
$c=$a[0]
$b=$a[2..$c]
(
    (gc wordlist.txt) -match "^$(
        (
            1..$c | ForEach-Object -Begin {
                $e = ,"[^$($a[1])]" * $c
            } -Process {
                $e[$_-1] = (
                    ( ( @($b) -match "^$_\D" )[0] -split $_ )[-1] , $e[$_-1] -ne ''
                )[0]
            } -End {
                $e-join''
            }
        )
    )$" -join '' -split'\B' |
    Where-Object {
        -not (@($b) -match $_).Count
    } | 
    Group-Object |
    Sort-Object count
)[-1].Name

Breakdown

The approach I took here was to first generate a regular expression to get possible words out of the word list. Since I know the length of the word, and the letters that didn't work, I can make a regex out of that fairly easily.

So in the PEOPLE example, 6 letters with XBZ not being part of the word, I would look to generate ^PE[^XBZ]P[^XBZ]E$.

I'm exploiting the fact that Get-Content (gc) returns an array of lines, and the -match operator when used with an array on the left side, returns an array of matches instead of a bool, so I can quickly get a list of just words that are candidates, once I have the regex.

To generate the regex, I start with an array ($e) of the negative matching character class with $c elements ($c being the number of letters in the word). Iterating through the numbers 1 through $c, I check for a matched letter at that position, and if it exists, I replace the element in $e with that letter.

Once I've iterated through all of the positions, the final array is -joined (with empty string) and we have our regex.

So now I've got an array of all the possible words it could be. A quick -join with empty string on that, gives me one big concatenated string of all the words, the I split on \B (not a word boundary, if I split on empty string I will get 2 extra blank elements), so now I have an array of every letter in every possible word.

Piping that into Where-Object lets me filter out the letters that have already been matched. This part was a real pain. It had to deal with the list of matched letters (which include the position) being 1 element, more than 1 element, or 0 elements, hence forcing $b into an array first so that -match can operate on all them, but that (unfortunately in this case) returns an array, so we have to check .Count. Using !(thing).Count is a bit smaller than using (thing).Count-gt0.

Moving on, now we've got an array of all the single characters (as strings not as chars) from all of the words it could possibly be, minus the letters that were already guessed correctly.

Piping that into Group-Object gives me an object with the counts of each letter, so a quick pipe into Sort-Object count makes it easy to get the highest count. Rather than do (thing|sort count -des)[0] we can use (thing|sort count)[-1]. In PowerShell [-1] gets the last element. At this point we're still dealing with the objects that came from Group-Object so we get the .Name property which is the letter that appears the most.

Notes

  • Should work with PowerShell v3+; almost certainly will choke on 2.
  • Remember when you call a PowerShell script, pass arguments with spaces, not commas.
  • Although I didn't see it in the rules, it looks like everyone is using the filename wordlist.txt otherwise that could shave a few bytes off.
  • Speed shouldn't be a problem. This appears to run instantly for me. The slowest run I could get it to do (.\hangman.ps1 7 0) runs in about 350ms.

briantist

Posted 2015-08-17T09:37:18.220

Reputation: 3 110

1Welcome to Programming Puzzles & Code Golf Stack Exchange, great first answer! :) – Doorknob – 2015-08-21T02:51:14.143

@Doorknob thanks very much! – briantist – 2015-08-21T03:32:55.023

6

Python3, 299 Bytes

import sys,collections as c;x,X,N,*Z=sys.argv;print([x for x in c.Counter(''.join([''.join(x)for x in map(set,filter(lambda l:len(l)==int(X)+1and all(x not in X.lower()for x in l)and all(l[int(x[0])-1]==x[1].lower()for x in Z),open('wordlist.txt')))]))if x not in ''.join(Z).lower()and x!='\n'][0])

pretty sure this can be golfed further.

Filters the wordlist for potential matches, builds a char frequency map, and selects the most often occuring character which is not yet picked.

ch3ka

Posted 2015-08-17T09:37:18.220

Reputation: 71

You have a lot of ''.join(..)s. If all the elements inside are strings of length 1, you can change it to '..'[2::5], where the apostrophes are backticks. – Kade – 2015-08-18T14:52:08.433

3

Java, 646 640 631 607 606 (short) 790 789 779 (fast) bytes

SHORT

import java.util.*;class I{public static void main(String[]a)throws Exception{char[]w=a[1].toCharArray(),p,q;int l=Integer.parseInt(a[0]),i,z=w.length,j;q=new char[l];for(i=2;i<a.length;i++)q[Character.getNumericValue(a[i].charAt(0))-1]=(char)(a[i].charAt(1)+32);java.io.File u=new java.io.File("wordlist.txt");Scanner s=new Scanner(u);while(s.hasNextLine()){p=s.nextLine().toCharArray();if(p.length==l)for(i=0;i<l;i++)if(p[i]==q[i]||q[i]=='\0'){if(i==l-1)y:for(i=0;i<l;i++)for(j=0;j<z;j++)if(!(p[i]==w[j])){if(j==z-1){System.out.print(p[new String(q).indexOf('\0')]);return;}}else break y;}else{break;}}}}

FAST

import java.util.*;class I{public static void main(String[]a)throws Exception{char[]w=a[1].toCharArray(),p,q;int l=Integer.parseInt(a[0]),i,z=w.length,j,k,o,n[]=new int[255],r[];q=new char[l];for(i=2;i<a.length;i++)q[Character.getNumericValue(a[i].charAt(0))-1]=(char)(a[i].charAt(1)+32);String m=new String(q);java.io.File u=new java.io.File("wordlist.txt");Scanner s=new Scanner(u);while(s.hasNextLine()){p=s.nextLine().toCharArray();h:if(p.length==l)for(i=0;i<l;i++)if(p[i]==q[i]||q[i]=='\0'){if(i==l-1)y:for(i=0;i<l;i++)for(j=0;j<z;j++)if(p[i]!=w[j]){if(j==z-1){for(k=0;k<l-m.replace("\0","").length();k++)n[(int)p[new String(q).indexOf('\0',k)]]++;break h;}}else break y;}else{break;}}r=n.clone();Arrays.sort(n);for(o=0;o<255;o++)System.out.print(r[o]==n[254]?(char)o:"");}}

Put the wordlist file in the folder.

Short version algorithm

  1. Load Args
  2. Construct the word we are trying to guess {'p','e','\0','p','\0','e'}
  3. Load WordList
  4. Go through each line of WordList
  5. Stop when you find that the whole word matches this condition p[i] == q[i] || q[i] == '\0' where p is a word from the wordlist (char array), and q is the word we are trying to guess
  6. Loop through wrong chars and compare to the word
  7. Print the first missing character

Long version algorithm

  1. Short steps 1-7
  2. Increment the character count in the n array for the missing chars
  3. Loop until all the words come through
  4. Print the character that had the highest count

Roberto Anić Banić

Posted 2015-08-17T09:37:18.220

Reputation: 131

can I remove the imports? – Roberto Anić Banić – 2015-08-17T14:05:21.057

Is it ok to remove java imports in code golf? – Roberto Anić Banić – 2015-08-17T14:12:23.557

it's ok, as long as you specify I Should import them (if not obvious) ;) – WayToDoor – 2015-08-17T14:21:10.137

Kk. I will update it when i come back home. Gonna go buy a new router :) and 90 feet of Cat5e – Roberto Anić Banić – 2015-08-17T14:36:06.603

1I don't think it works for words longer than 9 letters. Try using "prospective" from the wordlist with input "6 XBZ 1P 5P 6E 11E" Perhaps changing the first loop to q[Integer.parseInt(a[i].substring(0,a[i].length()-1))-1]=(char)(a[i].charAt(a[i].length()-1)+32); Also, a golfing tip: Try using Scanner s=new Scanner(new java.io.File("wordlist.txt")); – bmarks – 2015-08-17T21:48:07.533

2

PHP, 346 bytes

<?php $a='array_shift';$p='preg_match';$a($v=&$argv);$w=array_fill(1,$a($v),'(.)');$c=$a($v);foreach($v as$q)$p('/(\d+)(\w)/',$q,$m)&&$w[$m[1]]=$m[2];foreach(file('wordlist.txt')as$y)if($p("/^".implode($w)."$/",$y=strtoupper(trim($y)),$m)&&(!$c||!$p("/[$c]/",$y)))for($i=0;$i++<count($m);)($v=@++$g[$m[$i].$i])&&$v>@$t&&$t=$v&&$l=$m[$i];echo @$l;

It works as follows:

  1. Creates a regex pattern for matching the letters guessed so far
  2. Iterates over each word from the file
  3. If the word matches the regex, it makes sure that it doesn't contain one of the wrong letters
  4. Increments a counter for each of the possible letters of that word (based on their position)
  5. Outputs the letter with the highest counter

Assumptions:

  • PHP >=5.4
  • There is a wordlist.txt file in the current folder

Razvan

Posted 2015-08-17T09:37:18.220

Reputation: 1 361

`php hangman.php 6 YH 2E 6E 3O 1P 4P PHP Notice: Undefined offset: 2 in ./Desktop/hangman.php on line 1

Notice: Undefined offset: 2 in ./Desktop/hangman.php on line 1` Tried to make him guess people – WayToDoor – 2015-08-17T12:26:47.853

1Thanks for pointing it out. There was an small bug in the code. I updated it (still 346 bytes). – Razvan – 2015-08-17T12:39:33.213

1

Powershell, 153 bytes

Inspired by briantist's answer.

As other writers I used the filename wordlist.txt. Although it was possible to choose a shorter name.

param($l,$b,$c)((sls "^$(-join(1..$l|%{$p=$c|sls "$_."|% m*
("$p"[1],"[^$b$c]")[!$p]}))$" wordlist.txt|% l*e|% t*y|group|sort c*).Name-match"[^ $c]")[-1]

Less golfed test script:

$f = {

param($length,$bad,$chars)

$wordPattern=-join(1..$length|%{                  # join all subpatterns for $_ from 1 to $length
    $c=$chars|sls "$_."|% Matches                 # find subpattern in char array
    ("$c"[1],"[^$bad$chars]")[!$c]                # return a first char of subpattern if subpattern found, or bad characters
})

# Note 1: The word subpattern is similar to 'PE[^XBZ1P 4P 2E 6E]P[^XBZ1P 4P 2E 6E]E'
#         Spaces and digits does not affect the search for letters, only letters are important.
#
# Note 2: The same applies to 0. [^0] matches any letter.
#

$matches=sls "^$wordPattern$" wordlist.txt|% Line # find matched words in file wordlist.txt and return matched string only
$groups=$matches|% toCharArray|group|sort Count   # group all chars in matched words by count
$name=$groups.Name-match"[^ $chars]"              # select property Name from all grouped elements (chars itself) and return not matched to $chars only
$name[-1]                                         # return last element in the sorted array (most frequently found char)

# Note 3: The space is important in the regexp "[^ $chars]"
#         The space makes the regexp valid if the $chars array is empty

}

&$f 7 0 2o,5e,7t
&$f 7 nl 2o,5e,7t
&$f 6 XBZ 1P,4P,2E,6E

Output:

c
r
l

Variable values for &$f 7 0 2o,5e,7t:

$WordPattern: "[^02o 5e 7t]o[^02o 5e 7t][^02o 5e 7t]e[^02o 5e 7t]t"
$Matches: concept collect comment correct connect convert consent concert
$Groups:
    Count Name                      Group
    ----- ----                      -----
        1 p                         {p}
        1 v                         {v}
        1 s                         {s}
        2 m                         {m, m}
        2 l                         {l, l}
        4 r                         {r, r, r, r}
        8 n                         {n, n, n, n...}
        8 o                         {o, o, o, o...}
        8 t                         {t, t, t, t...}
        8 e                         {e, e, e, e...}
       13 c                         {c, c, c, c...}
$name: p v s m l r n c
$name[-1]: c
return: c

mazzy

Posted 2015-08-17T09:37:18.220

Reputation: 4 832