Solve the New York Times Spelling Bee

7

1

The New York Times periodically runs a puzzle they call the "Spelling Bee" in which they list six "allowed" letters plus one "mandatory" letter and challenge the reader to form as many five-letter-or-longer dictionary words as possible out of those letters. Each "allowed" letter can appear 0 or more times. The mandatory letter must appear at least once in each word. No other letters are allowed. Proper nouns are not acceptable answers.

For example, in last week's puzzle, the allowed letters were {"A", "C", "D", "H", "I", and "R"}, while the mandatory letter was "N".

Allowed answers for this set of letters range from 5-letter words "ranch", "naiad", and "nadir", up to 9-letter words "cnidarian" and "circadian".

The more words you find, the higher the Times rates your intelligence. Your challenge: write a solver in the language of your choice.

Additional rules and tips:

  1. If your programming environment or operating system provides a list of words, you may use it. If not, you must include code for loading a word list from disk or the internet; you can not assume it has been magically loaded into a variable outside your code.

  2. The word list should be generic, not limited to only those words that contain the mandatory letter, and not limited to words that meet the length or other requirements here. In other words, all dictionary searches should be done by your code, not some pre-processor.

  3. If a word is capitalized in the dictionary, treat it as a proper noun. If your dictionary does not distinguish proper nouns, find a different dictionary.

  4. Different dictionaries will deliver different sets of answers, but given the test case above, you must return at least 20 qualifying answers, and the words "harridan", "niacin", and "circadian" must be on it. The words "Diana", "Adrian", and "India", as proper nouns, must not be.

  5. Your code should take the form of a function or program that accepts two variables: a string or list of characters representing the allowed letters, and a character representing the required letter. It should return a list of allowed answers in any order.

  6. Except to the extent my rules here dictate otherwise, standard loopholes are forbidden.

  7. In the event my rule summary here deviates from the rules at the New York Times, follow the rules here.

  8. Shortest byte count wins, though I plan to call out excellent solutions using novel algorithms or that are otherwise interesting to me, assuming any such are submitted.

Michael Stern

Posted 2017-02-25T23:03:41.407

Reputation: 3 029

2

I might recommend just specifying the appropriate wordlist and its file name, a la "Let's Play Hangman".

– briantist – 2017-02-26T00:33:04.120

Also the standard here is to allow a program or function, why is this limited to a function specifically?

– briantist – 2017-02-26T00:35:10.767

4"you can not assume it has been magically loaded into a variable outside your code" - how about just accepting a list of words as an input to the (program or) function? – Jonathan Allan – 2017-02-26T00:58:02.770

The word india is also the code word for the letter I in international radio communication, and hence not a proper noun, so could be listed in the dictionary as such (CSW has it for example and that has no proper nouns). – Jonathan Allan – 2017-02-26T03:06:03.960

@briantist I suppose a program that follows the other rules would be fine. I will modify the terms accordingly. – Michael Stern – 2017-02-26T04:17:07.290

Answers

4

bash/dash/ash with grep and egrep, 31 30 29 bytes

egrep ^[$1$2]\{5,}$ z|grep $2

This assumes you have a a word list in a file named z. The word list I used was kindly provided by briantist.

Example:

$ echo 'egrep ^[$1$2]\{5,}$ z|grep $2' > program; chmod +x program
$ ./program acdhir n | tr '\n' ' '
acarian acaridan acini ahind anana anarch anarchic arachnid arachnidan arcana arnica cairn canard cancan candid candida canid canna cannach caranna carina chain characin chicana china chinar chinch cinch circadian cnida cnidarian cranch crania dharna diarian dinar dinic drain handcar harridan inarch indican indicia indri iridian nadir naiad naira nanna niacin nicad rachidian radian ranarian ranch rancid randan ricin

zgrep

Posted 2017-02-25T23:03:41.407

Reputation: 1 291

I'm no bash expert, but do you need that space between the opening { and grep? – briantist – 2017-02-26T03:29:41.927

Also note the rule changes, and the subsequent changes I made to the wordlist. – briantist – 2017-02-26T04:45:06.987

@briantist I tried, and apparently I do. However, if it can be an entire program, I can just put it into a file... – zgrep – 2017-02-26T06:43:38.420

I'm not sure this solution (while awesomely short) meets the spec. By my reading, all result words have to be at least five letters long, but this one outputs words like "an", which are too short. – Tutleman – 2017-02-26T14:47:09.957

Oh, whoops. Didn't see that, sorry. – zgrep – 2017-02-26T23:27:40.780

? egrep ^(?=$2)[$1$2]\{5,}$ – mazzy – 2018-11-09T09:48:15.863

7

PowerShell, 50 44 41 39 43 bytes

param($a,$m)gc z|sls $m|sls "^[$a$m]{5,9}$"

Notes

  • Using a scriptblock (unnamed function) because there's still a requirement for a function full program.
  • Takes the "allowed" letters as a single string.
  • Using The English Open Word List which is provided as a series of lists (one for each letter). I've combined them all into a single file named z, but I only included words that are between 5 and 9 letters, inclusive due to the new rule, the wordlist and my gist have been updated to not exclude any of the original words (of any size). Here's the list, for anyone else to use.

Explanation

Reads the wordlist file z as lines and does two regular expression matches (by way of Select-String with the sls alias). The first sls matches the mandatory letter, so the result is all the words that contain the letter, then that gets piped into the second sls which uses an expression that, in the case of the example input, looks like this: ^[acdhirn]{5,9}$ (so it matches only words that consist of those letters and no others, between 5 and 9 characters in length, inclusive).

Invocation and Output

&{param($a,$m)gc z|sls $m|sls ^[$a$m]+$} 'acdhir' 'n'

Output (57 words):

acarian
acaridan
acini
ahind
anana
anarch
anarchic
arachnid
arcana
arnica
cairn
canard
cancan
candid
candida
canid
canna
cannach
caranna
carina
chain
characin
chicana
china
chinar
chinch
cinch
circadian        #
cnida
cnidarian
cranch
crania
dharna
diarian
dinar
dinic
drain
handcar
harridan         #
inarch
indican
indicia
indri
iridian
nadir
naiad
naira
nanna
niacin           #
nicad
rachidian
radian
ranarian
ranch
rancid
randan
ricin

briantist

Posted 2017-02-25T23:03:41.407

Reputation: 3 110

See spec clarification, please don't assume the dictionary has been pre-processed to remove short words. – Michael Stern – 2017-02-26T04:35:32.333

1@MichaelStern you just added that; you should make it clearer from the beginning. Specifying a specific wordlist as I suggested also would have made this point moot. I don't care much since it only adds 6 bytes to my answer, but now you've got 3 existing answers written with the old rule. – briantist – 2017-02-26T04:38:14.323

I agree, it would have been better if I had thought to specify it originally. – Michael Stern – 2017-02-26T04:39:42.487

@MichaelStern code and wordlist have been updated. – briantist – 2017-02-26T04:45:36.933

6

Mathematica, 143 130 bytes

Join@@StringCases[Join@@StringCases[WordList[],RegularExpression["^["<>#2<>#<>"]{5,9}$"]],RegularExpression["\w+(?="<>#<>")\w+"]]&

Invocation (with input)

Join@@StringCases[Join@@StringCases[WordList[],RegularExpression["^["<>#2<>#<>"]{5,9}$"]],RegularExpression["\w+(?="<>#<>")\w+"]]&["n","acdhir"]

Output (20 words)

{"anarchic", "arachnid", "cairn", "canard", "cancan", "candid", "candida", "chain", "china", "cinch", "circadian", "cnidarian", "dinar", "drain", "handcar", "harridan", "niacin", "radian", "ranch", "rancid"}

Explanation

Join@@    // Shortened version of Flatten; removes {} from StringCases lists
    StringCases[    // Find substrings in a string/list that match a regex pattern.
        Join@@
            StringCases[
                WordList[],    // Built-in function; returns a list of English words.
                RegularExpression[
                    "^["<>#2<>#<>"]{5,9}$"    // Take the WordList, and find 5-9 letter
                                              // words with any character in the
                                              // the first or second arguments.
                                              // In this case, "x" would be "achdir",
                                              // and "y" is the letter "n".
                                              // <> is for string concatenation.
                ]
            ],
        RegularExpression[
            "\w+(?="<>#<>")\w+"              // Take the result from the previous
                                              // StringCases function, and find words
                                              // that actually have the character(s) in
                                              // the second argument in them.
                                              // In this case, find words that actually
                                              // have the letter "n" in them.
        ]
    ]
&    // Define an anonymous function.
["n","acdhir"]    // Pass arguments to the function; "#" is "n", and "#2" is "acdhir".

memethyl

Posted 2017-02-25T23:03:41.407

Reputation: 61

1Welcome to the site! :) – James – 2017-02-26T07:44:08.193

3Nice first answer! Here are a couple of golfing tips: you can replace Function[{x,y},...] with ...& as long as you replace every occurrence of x with # and every occurrence of y with #2. (In this case, you call x once and y twice, so you can save one additional byte by exchanging the order of the arguments.) You can also save bytes by giving a name to the function Join@@StringCases[#,RegularExpression[#2]& and using the name twice, or perhaps Fold. Remember, clarity is not key here: using as few bytes as possible is ... happy golfing! – Greg Martin – 2017-02-26T23:10:39.020

3

Python 3.6  84  78 bytes

-6 bytes thanks to WuTheFWasThat (splitlines -> split and {c}and 4<len(w)<10 -> {c*(4<len(w)<10)})

lambda s,c:[w for w in open('z').read().split()if{*w}-{*s}=={c*(4<len(w)<10)}]

An unnamed function that takes a string and a character. Returns a list created by keeping those entries (ws) from a file where the set of remaining letters after removing any that are in the "can use" set ({*s}) is the set of the single "must use" character. The *s in {*x} unpack the strings to save bytes over set(x). The word-length check is 4<len(w)<10 and string multiplication saves a byte (since {*w}-{*s} can never result in a set containing an empty string).

Requires a file in the current directory named simply z containing the list of words separated by newlines (or carriage returns or both).

Taking a list of words would be preferable at 58 bytes.

With s='acdhri' and c='n' and using a file containing all entries and suffixed entries from the Chambers Scrabble™ Words (CSW) dictionary A.K.A. SOWPODS (which explicitly excludes proper nouns) we get the following resulting list of 85 words:

['acarian', 'acaridan', 'acaridian', 'acinar', 'acini', 'acinic', 'acridin', 'adhan', 'ahind', 'anana', 'anarch', 'anarchic', 'anicca', 'aniridia', 'aniridic', 'arachnid', 'arcadian', 'arcan', 'arcana', 'arnica', 'cairn', 'canada', 'canard', 'cancan', 'cancha', 'candid', 'candida', 'canid', 'canna', 'cannach', 'caranna', 'cardan', 'carina', 'chain', 'chana', 'characin', 'chicana', 'china', 'chinar', 'chinch', 'cinch', 'circadian', 'cnida', 'cnidarian', 'cranachan', 'cranch', 'crani', 'crania', 'darrain', 'dharna', 'diarian', 'dinar', 'dinic', 'dinna', 'drain', 'hainch', 'hanch', 'handcar', 'hariana', 'harridan', 'hinahina', 'inarch', 'india', 'indican', 'indicia', 'indri', 'iridian', 'nadir', 'naiad', 'naira', 'nandin', 'nandina', 'nanna', 'naric', 'niacin', 'nicad', 'rachidian', 'radian', 'radicand', 'ranarian', 'ranch', 'rancid', 'randan', 'ranid', 'ricin']

The word "india", is listed by this dictionary even though, due to the rules of Scrabble™, it explicitly excludes proper nouns, as: "(In international radio communication) a code word for the letter i [-S]" ('-S' indicating "inidas" is also acceptable), of course we could use any other file listing the words, but do note that "india" is also listed in TWL, the equivalent dictionary used in official Scrabble™ tournaments in the U.S. and Canada.

Jonathan Allan

Posted 2017-02-25T23:03:41.407

Reputation: 67 804

Don't presume your wordlist has already been screened to remove 1-4 character words. – Michael Stern – 2017-02-26T04:30:36.117

@MichaelStern I may be blind, was that in the spec? – Jonathan Allan – 2017-02-26T04:32:27.550

@MichaelStern Also note, both other entries have done the same! – Jonathan Allan – 2017-02-26T04:33:38.193

1@MichaelStern the other 2 answers do that (started by me), because you had us go out and find a wordlist instead of specifying one. I, accordingly, did so, and since I had to pre-process it into a single file anyway, I kept only 5-9 words. It wasn't restricted in the spec (and I've put my wordlist up for anyone to use). – briantist – 2017-02-26T04:34:52.583

I have clarified the rules, but the intent is to create code that finds words that follows the Spelling Bee rules. Pre-processing the dictionary changes the challenge. None of this is intended as disrespect of your code, which can be adapted easily. – Michael Stern – 2017-02-26T04:37:48.180

1@MichaelStern I have updated the code to reflect the updated spec. – Jonathan Allan – 2017-02-26T04:45:01.870

@JonathanAllan Your list also contains china. The wordlist I use doesn't contain india or any other proper nouns, and is available for use (linked in my answer). – briantist – 2017-02-26T04:49:58.507

@briantist china seems indisputably valid - it should appear as both a noun and a proper noun in a normal dictionary. EDIT: I see it in your list too.

– Jonathan Allan – 2017-02-26T04:56:13.587

@JonathanAllan right you are! – briantist – 2017-02-26T17:25:42.620

@WuTheFWasThat - thanks for the saves! (FYI it's standard practice to just comment with such help) – Jonathan Allan – 2018-07-12T17:53:57.990

2

Perl 6, 48 41 bytes

{grep /^@_*[$^a@_*]+$/,grep *.comb>4,1.IO.words}

{grep /^@_**5..*$/&/$(@_[0])/,1.IO.words}

Expects a whitespace-separated wordlist as a file called 1 in the current directory.
Expects the required character as the first argument, and a list of the additional allowed characters as the second argument (or as multiple arguments, it doesn't matter), all in lowercase.

How it works

{                                       } # A lambda. (@_ is the flattened argument list.)
                              1.IO.words  # Read words from file as a lazy sequence.
 grep /          /&/        /             # Filter words which match *both* these regexes:
       ^@_**5..*$                         #   1) Consists of 5+ allowed chars.
                    $(@_[0])              #   2) Contains the required char.

smls

Posted 2017-02-25T23:03:41.407

Reputation: 4 352

1

Perl, 43 + 1 (-n switch)

/^[^\p{Lu}](?i)(?=.*n)[acdhirn]{5,9}$/&&say

Run on linux:

perl -nE '/^[^\p{Lu}](?i)(?=.*n)[acdhirn]{5,9}$/&&say' < /usr/share/dict/american-english

Output: 33 words without proper nouns

anarchic
arachnid
banana
bandana
bandanna
branch
canard
cancan
candid
circadian
crania
enrich
errand
farina
grandad
granddad
handcar
harridan
maharani
mandarin
maniac
marina
niacin
ocarina
ordain
picnic
piranha
rancid
saccharin
tannin
unhand
urchin
zinnia

Toto

Posted 2017-02-25T23:03:41.407

Reputation: 909

1

Powershell, 41 bytes

Inspired by briantist's answer.

briantist uses 3 cmdlets to make a solution. I would like to keep within one cmdlet.

param($a,$m)sls "^(?=.*$m)[$a$m]{5,9}$" z

As with the briantist's answer, I used The English Open Word List files combined into a single file named z.

As in other answers, the regex subpattern ^[$a$m]{5,9}$ asserts when a line contains allowed and required chars only.

The basic difference: (?=.*$m) asserts that the required character can be matched somewhere in a line, without consuming characters. See docs.

Unfortunately, sls provides additional information, not matches only. The rules don't forbid it :-). To display only words, you need to get the value of the Line property (47 bytes) or call the cmdlet gc (43 or 44 bytes).

Test script:

$f = {

                                                    # the output contains words and extra info
 param($a,$m)sls "^(?=.*$m)[$a$m]{5,9}$" z          # 41 bytes, 1 command
#param($a,$m)sls $m z|sls "^[$a$m]{5,9}$"           # 40 bytes, 2 commands

                                                    # only words solutions:
#param($a,$m)sls "^(?=.*$m)[$a$m]{5,9}$" z|% l*e    # 47 bytes, 1 command + 1 property
#param($a,$m)gc z|sls "^(?=.*$m)[$a$m]{5,9}$"       # 44 bytes, 2 commands
#param($a,$m)gc z|sls $m|sls "^[$a$m]{5,9}$"        # 43 bytes, 3 commands, briantist's https://codegolf.stackexchange.com/a/111293/80745

}

&$f 'acdhir' 'n'

Output:

z:542:acarian
z:547:acaridan
z:854:acini
z:2134:ahind
z:3624:anana
z:3644:anarch
z:3647:anarchic
z:5042:arachnid
z:5117:arcana
z:5466:arnica
z:14834:cairn
z:15334:canard
z:15341:cancan
z:15363:candid
z:15364:candida
z:15414:canid
z:15435:canna
z:15440:cannach
z:15866:caranna
z:16065:carina
z:17442:chain
z:17721:characin
z:18241:chicana
z:18408:china
z:18411:chinar
z:18418:chinch
z:19142:cinch
z:19213:circadian
z:20286:cnida
z:20288:cnidarian
z:23975:cranch
z:23983:crania
z:28928:dharna
z:29121:diarian
z:29648:dinar
z:29687:dinic
z:32006:drain
z:48944:handcar
z:49311:harridan
z:54033:inarch
z:54380:indican
z:54393:indicia
z:54473:indri
z:56194:iridian
z:70546:nadir
z:70575:naiad
z:70595:naira
z:70655:nanna
z:71643:niacin
z:71659:nicad
z:87803:rachidian
z:87884:radian
z:88295:ranarian
z:88304:ranch
z:88321:rancid
z:88329:randan
z:92641:ricin

mazzy

Posted 2017-02-25T23:03:41.407

Reputation: 4 832