Create a uniquely solvable crossword ... without clues

22

6

Can you imagine solving the New York Times crossword puzzle without any clues? Maybe not with all of the creativity and new words and phrases appearing in modern crosswords, but with a fixed word list there's some hope. In this challenge, you create a crossword puzzle grid in which this is theoretically possible.

The challenge

Maximize the number of white squares in a white- and black-shaded 15x15 crossword puzzle grid, such that the white squares can be uniquely filled with letters so that every across and down word appears in the international Scrabble word list.

Grid construction clarifications

In US newspapers, crossword grids are usually constructed so that every letter is "checked", meaning it is part of both an "across" word and a "down" word. In the UK and elsewhere (especially in cryptic crosswords), this is not necessarily the case: if an "across" or "down" word would only be one letter, it need not be an actual word (like "A" or "I"). For this challenge, follow the more relaxed rules: single-letter words need not appear in the word list.

There are various other traditions (in the US and elsewhere), none of which need to be followed in this challenge. For example, words can be only two letters long, words are allowed to repeat, and the grid need not have (rotational) symmetry.

Is this even possible?

Yes! One can write a short script to verify that the unique solution to the following blank grid on the left is the filled grid on the right:

15x15 grid with four 15-letter words crossed at their fourth and fifth letters

One can display the filled grid in a computer-readable format as follows:

###CH##########
###YE##########
###AM##########
CYANOCOBALAMINE
HEMOCHROMATOSES
###CH##########
###OR##########
###BO##########
###AM##########
###LA##########
###AT##########
###MO##########
###IS##########
###NE##########
###ES##########

Your solution

The grid above has 56 white squares out of the 225 squares total in the 15x15 grid. This serves as a baseline for this challenge. Grids with fewer white squares may also be interesting for reasons other than their score, for example if they satisfy some of the aesthetic traditions mentioned above.

Please submit your solution in the same format as the computer-readable baseline above. Please include code that verifies that there is a unique solution to your grid.

Interesting code snippets (eg for searching the space of possibilities) and discussion of how you found your grid are appreciated.

The word list

The international Scrabble word list was previously known as SOWPODS and is now called Collins Scrabble Words (CSW). It is used in most countries (except notably the US). We prefer to use this list because it includes British spellings and generally has significantly many words than the American word list. There are multiple editions of this list which differ slightly. You can find different versions of this list linked from Wikipedia, on Github, in Peter Norvig's Natural Language Corpus and elsewhere, often still called "SOWPODS".

This challenge is highly sensitive to the broad nature of the choice of word list, but less so to smaller details. For example, the baseline example above works with any edition of CSW, but CH is not a word in the American Scrabble word list. In the event of a discrepancy, we prefer to use CSW19, the most recent edition of CSW. (If we use this list, which was released this year, we can expect answers to this challenge to remain valid longer). You may query this list interactively on the official Scrabble word finder site or download it (as well as the previous edition, CSW15) from the Board & Card Games Stack Exchange or Reddit's r/scrabble.

Tldr: the authoritative word list for this challenge is available as a plain text file (279,496 words, one per line) over on the Board & Card Games Stack Exchange.

Further discussion

One issue raised in an early answer and comment is why existing crosswords (eg, in the NYT) don't answer this question. Specifically, the record for fewest number of black squares (and thus largest number of white squares) for a published NYT crossword is already the most famous record in crosswords. Why can't we use the record grid? There are a few issues:

  • Many of the answers in NYT crosswords do not appear on our word list. For example, the record grid includes PEPCID (a brand name), APASSAGETOINDIA (a four-word proper name for a film and novel, written without spaces), and STE (an abbreviation for "Sainte"). It appears that the record grid is not solvable with Scrabble words.

  • Merely expanding the word list to include more words does not necessarily help with this challenge: even if all of the words in the record grid appeared on our word list, the solution would not be unique without the clues. It is often possible to alter some letters at the ends of answers while keeping everything a word. (For example, the bottom-right-most letter can be changed from a D to an R.) Indeed, this is part of the (human) construction process when writing a crossword, trying to obtain "better" words.

    The reason ordinary crosswords (usually) have a unique solution is that the clues help narrow down the correct answers. If you simply try to fill the grid with words without using clues, it's likely that there will either be no possibilities or many possibilities. Here's an example of three different fills (using the word list for this challenge!) for the same grid (one that's relatively frequently used in the NYT):

The most common NYT crossword grid, filled in three different ways with Scrabble words.

  • Another issue raised in the comments is some amount of disbelief that this question is a coding challenge. Perhaps it is not immediately clear, but it is hard to even find a single valid answer to this challenge. Finding the baseline above involved multiple, specially-crafted search programs that were not guaranteed to find an answer. I do not personally even know of a general way to solve an arbitrary grid, if you want the answer in reasonable time. Existing crossword construction programs can help, but I assume (perhaps incorrectly) that they do not actually do a full search of the possibilities. (I used such a program for the three side-by-side grids above; this worked because that particular grid allows for many solutions.)

A. Rex

Posted 2019-09-18T13:26:31.040

Reputation: 1 979

Comments are not for extended discussion; this conversation has been moved to chat.

– Doorknob – 2019-09-23T13:40:10.643

2

Meta post related to this general type of questions: https://codegolf.meta.stackexchange.com/questions/18117/questions-that-ask-for-artifacts-or-witnesses

– A. Rex – 2019-09-24T10:25:24.357

I'm voting to reopen anyway, but I recommend the following changes to make further reopen votes more likely: (posting as separate comments so you can see if people agree with each one) – trichoplax – 2019-09-25T20:54:17.760

3>

  • Drop the aesthetic option ("Grids with fewer white squares may also be interesting for reasons other than their score, for example if they satisfy some of the aesthetic traditions mentioned above.") - similarly to avoiding bonuses in code golf, I'd rather a code challenge be about just one thing. This means all the answers can be compared like for like. It also makes it clearly objective, which will help with reopen votes.
  • < – trichoplax – 2019-09-25T20:54:56.783

    4 start="2">

  • Choose a single word list and insist on it for all answers. The tldr mentions an authoratative word list, but the discussion beforehand may lead people to thinking they can pick any of those mentioned. It may help to keep the strict requirements near the top of the post, and to make it very clear that other details are not part of the specification of the challenge. Ideally, omit anything superfluous to the spec to keep the post short and immediately unambiguous.
  • < – trichoplax – 2019-09-25T20:55:45.410

    2 start="3">

  • Make the inclusion of the code used to find the solution a requirement for a valid answer.
  • < – trichoplax – 2019-09-25T20:56:48.800

    Having said all that, I already like this challenge. +1 plus a reopen vote. I personally found the background and further discussion interesting, but they do not form part of the challenge specification so this might work better as a golfed down minimal spec which has a link at the end to the discussion. – trichoplax – 2019-09-25T21:01:35.097

    3This is the kind of challenge that could benefit from a chat room for people to discuss approaches. If you set up a chat room and link to it from the end of the spec, you can post discussion there as the initial posts, and mention this in the challenge for people who want to know more. – trichoplax – 2019-09-25T21:02:49.300

    I find this kind of code challenge both interesting and really challenging to get right. I'd love to see some detailed guidelines as answers to the meta post linked above. – trichoplax – 2019-09-25T21:04:01.577

    Could you add a link to the utility you used to make the baseline grids? Solutions will be prettier if we can use that in addition to the computer-readable format. – Robin Ryder – 2019-09-27T10:04:31.040

    1@RobinRyder: the program I used to create those diagrams is called "qxw". It is open-source and seems to be available for multiple operating systems. In particular, it is in the Ubuntu repositories. – A. Rex – 2019-09-27T11:32:50.973

    Answers

    10

    180 white squares

    Blank grid Solution

    My strategy was simply to find a smaller rectangle with no black squares, such that it can be filled in uniquely. All 2×k rectangles have multiple solutions. For 3×k rectangles, there are multiple solutions for k between 3 and 14, but there is a exactly one solution for k=15.

    I then fit 4 such rectangles in the grid. This means that each word appears 4 times in the solution, which is usually frowned upon in crossword construction, but OK for this challenge. On the other hand, this solution has both left/right and top/down symmetry!

    Computer-readable grid:

    HETERONORMATIVE
    OVEROPINIONATED
    POSSESSEDNESSES
    ###############
    HETERONORMATIVE
    OVEROPINIONATED
    POSSESSEDNESSES
    ###############
    HETERONORMATIVE
    OVEROPINIONATED
    POSSESSEDNESSES
    ###############
    HETERONORMATIVE
    OVEROPINIONATED
    POSSESSEDNESSES
    

    Here is the R code I used to find all solutions for a given grid size. Looping over all triples of 15-letter words is too slow. Instead, I try to fill in rectangles by

    • setting the first two columns (two 3-letter words)
    • then looping through all 15-letter words starting with the first two letters which are now settled.
    • for each possible choice of the 15-letter words, I then verify whether all 3-letter words generated are in the dictionary.

    For example, for the eventual solution, the code first put in HOP and EVO, then completed into HETERNORMATIVE, OVEROPINIONATED and POSSESSEDNESSES, and finally verified all the 3-letter words (HOP, EVO, TES, ERS, ROE, OPS, NIS, ONE, RID, MON, ANE, TAS, ITS, VEE, EDS).

    R code

    library(fastmatch)
    f = "scrabble-wordlist.txt"
    d = read.table(f, skip=2, as.is=T, na.strings=NULL)
    
    d$l = apply(d, 2, nchar)
    d3 = d[d$l==3, 1]
    
    sp = function(s) strsplit(s, "")[[1]]
    cm = function(v) paste0(v, collapse="")
    d3s = sapply(d3, sp)
    
    f3 = function(l){
      m = matrix("", 3, l)
    
      md = sapply(d[d$l == l, 1], sp)
      nf = 0
    
      a1 = seq(1, 3*l, by=3); a2 = a1 + 1; a3 = a1 + 2
    
      for(i in 1:ncol(d3s)){
        m[, 1] = d3s[, i]
    
        id1 = as.matrix(md[, md[1, ] == m[1, 1]])
        id2 = as.matrix(md[, md[1, ] == m[2, 1]])
        id3 = as.matrix(md[, md[1, ] == m[3, 1]])
    
        if(any(ncol(id1) == 0, ncol(id2) == 0, ncol(id3) == 0)) next
    
        for(j in 1:ncol(d3s)){
          m[, 2] = d3s[, j]
    
          jd1 = as.matrix(id1[, id1[2, ] == m[1, 2]])
          jd2 = as.matrix(id2[, id2[2, ] == m[2, 2]])
          jd3 = as.matrix(id3[, id3[2, ] == m[3, 2]])
    
          if(any(ncol(jd1) == 0, ncol(jd2) == 0, ncol(jd3) == 0)) next
    
          for(k1 in 1:ncol(jd1)){
            m[1, ] = jd1[, k1]
    
            for(k2 in 1:ncol(jd2)){
              m[2, ] = jd2[, k2]
    
              for(k3 in 1:ncol(jd3)){
                m[3, ] = jd3[, k3]
    
                w = paste0(m[a1], m[a2], m[a3])
                if(all(w %fin% d3)){
                  nf = nf + 1
                  print(m)
                }
                if(nf >= 2){
                  print(c(l, nf))
                  return()
                }
              }
            }
          }
        }
      }
    
      return(nf)
    }
    

    Called as f3(15). Took a few hours on my personal computer.

    Robin Ryder

    Posted 2019-09-18T13:26:31.040

    Reputation: 6 625

    @downvoter Could you comment? – Robin Ryder – 2019-09-28T17:55:11.263

    My answer was also downvoted. – A. Rex – 2019-10-01T19:58:08.373

    2

    182 white squares

    Four 3x15 regions connected by a couple more white squares.

    Inspired by Robin Ryder's answer, I tried to squeeze in a couple more white squares. I believe this solution is unique, and I will soon post verification code accordingly.

    Computer-readable grid:

    HETERONORMATIVE
    OVEROPINIONATED
    POSSESSEDNESSES
    B##############
    INCOMMUNICATIVE
    NEUROANATOMICAL
    DETERMINATENESS
    ###############
    HETERONORMATIVE
    OVEROPINIONATED
    POSSESSEDNESSES
    B##############
    INCOMMUNICATIVE
    NEUROANATOMICAL
    DETERMINATENESS
    

    A. Rex

    Posted 2019-09-18T13:26:31.040

    Reputation: 1 979

    184 since mon?cot can be completed uniquely with monocot – Jonathan Allan – 2019-10-05T17:49:08.457

    ...make that "maybe..." since I have not verified it wont break uniqueness across the board! – Jonathan Allan – 2019-10-05T18:18:30.517

    I'd be curious to see your verification code. All my attempts to verify your grid are horrendously slow. – Robin Ryder – 2019-10-14T16:31:34.647