Crafting Canonically Cöcher Crosswords

7

The Setup

Most of us are familiar with crossword numbering, which follows three basic rules:

  1. in a character grid consisting of blacks and whites (see below), any maximal contiguous horizontal or vertical chain of whites with length ≥ 2 is considered a word
  2. any white at the leftmost end of a horizontal word and/or at the topmost end of a vertical word is called a "starter"
  3. starters are assigned numbers (sequential integers starting with 1) scanning through the grid in standard English reading order (starting at the top left of the grid, scanning right-fast, down-slow to the bottom right of the grid)

A "crossword" generally refers to any n-by-n square grid of blacks and whites, with n ≥ 2, containing at least one black and at least one word, and containing no whites that do not belong to at least one word.

Some additional properties that a crossword may or may not possess:

  • a crossword is called singly connected if every white in the grid can be connected to every other white in the grid by moving north, east, south, and west (not diagonally) along a path consisting entirely of whites
  • a crossword is called proper if every white belongs to exactly two words (one horizontal, one vertical)
  • a crossword is called symmetric if for all x = 1..n, y = 1..n, the grid cell at (x,y) is a white if and only if the grid cell at (n+1-x,n+1-y) is a white, given 1-based indexing

Most run-of-the-mill cryptic crosswords are singly-connected and symmetric, but not proper. Many newspaper crosswords are proper, singly-connected, and symmetric. The figure below also shows a proper, singly-connected, symmetric crossword.

                 Crossword
       w/o #s              w/ #s (mod 10)
  ##################    12##################
    #   ######  ####    3 45#678######90####
        #       ####    1   2   #34567  ####
      ###     ######    8     ###9    ######
##       ####   ####    ##0   123####456####
#   #    ##       ##    #7  #8   ##90   12##
   ##    ##  #          3  ##4   ##5 #6   78
  ####   #####          9 ####0  #####1     
#   ##     ####   ##    #234##5  67####8  ##
#      #   ##    ###    #9  01 #2  ##34  ###
###    ##   #      #    ###5   ##6 7#8   90#
##   ####     ##   #    ##1  ####2  3 ##4  #
      #####   ####      56   7#####8  ####90
      #  ##    ##       1     #23##4  5##6  
##       ##    #   #    ##7   8  ##9   #0  #
####   ####       ##    ####1  ####2   3  ##
######     ###          ######45678###9   01
####       #            ####23     #45      
####  ######   #        ####6 ######7  #8   
##################      ##################9 

The Problem

...other than finding a crossword puzzle challenge that hasn't already been done. :P

Recall the crossword numbering challenge from days of yore, which accepts a crossword as input and produces clue numbering as output. Here we consider its younger (and significantly harder) cousin by reversing input and output:

Given a series of across/down clue numbers, generate a crossword puzzle whose numbering (per the canonical rules) yields these clue numbers.

For example, consider the clue numbers as input:

Across 3,5,6,7,8,9    Down 1,2,3,4,5,A

Two crosswords with compliant numbering are:

          A                           B
         6x6                         7x7
 w/o #s       w/ #s          w/o #s        w/ #s
 ______       ______        _______       _______
|## # #|     |##1#2#|      |#######|     |#######|
|#     |     |#3   4|      |# ## ##|     |#1##2##|
|  ##  |     |5 ##6 |      |# #  # |     |# #3 #4|
|  ##  |     |7 ##8 |      |  #  # |     |5 #6 # |
|     #|     |9  A #|      | #  #  |     | #7 #8 |
|# # ##|     |# # ##|      |  ## ##|     |9 ##A##|
 ¯¯¯¯¯¯       ¯¯¯¯¯¯       |#### ##|     |#### ##|
                            ¯¯¯¯¯¯¯       ¯¯¯¯¯¯¯

Solution A is singly-connected and symmetric, while B is neither. Solution A is also notably more compact, having only 14 blacks, while solution B has 29 blacks.

Note that you do not need to worry about what letters could/would go into any whites. This challenge is purely geometric in nature.

There are of course many trivial ways of generating compliant crosswords, which takes us to the interesting matter of...

Scoring

This 600-textline file contains 300 sets of across/down clue numbers for crosswords in the format

[across clue numbers for set 1, separated by spaces]
[down clue numbers for set 1, separated by spaces]
[across clue numbers for set 2, separated by spaces]
[down clue numbers for set 2, separated by spaces]

...

[across clue numbers for set 300, separated by spaces]
[down clue numbers for set 300, separated by spaces]

Your objective is to write a program that generates a valid crossword for each set of across/down clue numbers and computes the resulting number of blacks.

The program should produce all 300 crosswords in 30 minutes or less running on a modern computer.

By setting a constant in the program or passing a command line argument, the program should be able to output the crossword grid (solution) for the kth input set given k ∈ 1..300, and you should display representative output (the black/white grid) for at least k = 1 and k = 300 along with your answer. The output can be text- or graphics-based, with or without clue numbers shown (preferably with, mod 36 for text-based graphics), and with any embellishments you desire. (This is simply for validation and showmanship.)

The output should also indicate whether the crossword is proper, singly-connected, and/or symmetric.

Your program's score is the sum total number of blacks over all 300 solutions. The program that produces the lowest total number of blacks is the winner.

per-crossword Bonuses

For any of the 300 crosswords (solutions) generated by your program:

  • if a crossword is proper, multiply its black count by 0.6
  • if a crossword is singly connected, multiply its black count by 0.85
  • if a crossword is symmetric, multiply its black count by 0.75
  • if any two of the above conditions apply for a crossword, multiply its black count by 0.4
  • if all three of the above conditions apply for a crossword, multiply its black count by 0.25

You may only claim one bonus per crossword.

The score for each crossword should be rounded up to the nearest integer.

Note that all of the input sets in the problem correspond to randomly-generated small(ish) singly-connected, proper, symmetric crosswords, hence I guarantee the existence of at least one such a solution.

Feel free to raise any questions or concerns in the comments section.

Happy crosswording. :)

COTO

Posted 2014-10-27T08:14:13.327

Reputation: 3 701

Note that this is more culturally specific than you believe. Certainly in Spain the numbering system is completely different, and although in the UK the numbering system is the same as the one you describe I strongly doubt that the assertion "Most newspaper crosswords are proper" holds. – Peter Taylor – 2014-10-27T09:45:54.780

@PeterTaylor: Spec updated. ;) – COTO – 2014-10-27T12:05:50.267

1If it's singly connected and symmetric, the score is 0.4x. If it's also proper, it should be multiplied by 0.6, but that yields 0.24x which is better than the score for all three conditions. Is this intentional? – Ypnypn – 2014-10-27T13:17:25.547

@Ypnypn: The bonuses don't stack. Basically, claim the single best bonus that applies out of the five options listed. I've clarified this in the spec. – COTO – 2014-10-27T15:55:15.210

Answers

2

Python, total score 127,634(!)

This was a good excercise in humility. The score claimed above is using a fairly naive construction algorithm. I also tried implementing a slightly optimized brute force search, but while this managed to solve a few of the smaller crosswords in the 1-100 range (13x13 or less) it struggled with most and had no chance with the larger ones.

Crossword class

Nothing fancy here...

import itertools
import math
import copy
import sys

def read_crosswords(filename):
  crosswords = []
  with open(filename, 'r') as f:
    for across, down in zip(f, f):
      alist = list(map(int, across.rstrip().split()))
      dlist = list(map(int, down.rstrip().split()))
      crosswords.append(XWord(alist, dlist))
  return crosswords

def solve_crosswords(filename, solver, display=()):
  crosswords = read_crosswords(filename)
  total_score = 0
  for x, i in zip(crosswords, itertools.count(1)):
    print("Crossword {0}... ".format(i), end="")
    sys.stdout.flush()
    assert(solver(x))
    score = x.score(); details = ["score = {0}".format(score)]
    if x.is_symmetric(): details.append("symmetric")
    if x.is_proper(): details.append("proper")
    if x.is_connected(): details.append("connected")
    print("done! ({0})".format(", ".join(details)))
    if i in display: x.print_grid()
    total_score += score
  print("Total score: {0}".format(total_score))
  return crosswords

def flood_fill(grid, x, y, colorA, colorB):
  if grid[x][y] != colorA: return
  grid[x][y] = colorB
  flood_fill(grid, x-1, y, colorA, colorB)
  flood_fill(grid, x+1, y, colorA, colorB)
  flood_fill(grid, x, y-1, colorA, colorB)
  flood_fill(grid, x, y+1, colorA, colorB)

class XWord:

  B,W,U = range(3)

  def __init__(self, across, down):
    self.grid = []                                 # grid (2D array with black border)
    self.n = 0                                     # grid size (excluding border)
    self.num_clues = max(across + down)            # number of clues
    self.across = [(i+1) in across for i in range(self.num_clues)] # acrossness bool array
    self.down = [(i+1) in down for i in range(self.num_clues)]     # downness bool array

  def print_grid(self, numbers=True):
    """Print the grid."""
    i = 1
    for y in range(1,self.n+1):
      for x in range(1,self.n+1):
        if self.grid[x][y] == XWord.B: print("#",end="")
        elif self.grid[x][y] == XWord.W:
          across = (self.grid[x-1][y] == XWord.B) and (self.grid[x+1][y] == XWord.W)
          down = (self.grid[x][y-1] == XWord.B) and (self.grid[x][y+1] == XWord.W)
          if numbers and (across or down):
            print("{0}".format(i % 10), end="")
            i += 1
          else: print(" ", end="")
        else: print("?", end="")
      print()

  def is_symmetric(self):
    """Whether the grid is symmetric."""
    for y in range(1,self.n+1):
      for x in range(1,self.n+1):
        if self.grid[self.n+1-x][self.n+1-y] != self.grid[x][y]: 
          return False
    return True

  def is_proper(self):
    """Whether the grid is proper."""
    for y in range(1,self.n+1):
      for x in range(1,self.n+1):
        if self.grid[x][y] == XWord.W:
          if not ((self.grid[x-1][y] == XWord.W or self.grid[x-1][y] == XWord.W) or
                  (self.grid[x][y-1] == XWord.W or self.grid[x][y+1] == XWord.W)):
            return False
    return True

  def is_connected(self):
    """Whether the grid is connected."""
    grid = copy.deepcopy(self.grid); filled = False
    for y in range(1,self.n+1):
      for x in range(1,self.n+1):
        if grid[x][y] == XWord.W:
          if not filled: 
            flood_fill(grid, x, y, XWord.W, XWord.B)
            filled = True
          else:
            return False
    return True

  def score(self):
    """The grid score."""
    blacks = 0
    for y in range(1,self.n+1):
      for x in range(1,self.n+1):
        if self.grid[x][y] == XWord.U:
          raise Exception("Incomplete grid!")
        elif self.grid[x][y] == XWord.B:
          blacks += 1

    symmetric = self.is_symmetric()
    proper = self.is_proper()
    connected = self.is_connected()

    if (symmetric and proper and connected): multiplier = 0.25
    elif (symmetric and (proper or connected) or proper and connected): multiplier = 0.4
    elif proper: multiplier = 0.6
    elif symmetric: multiplier = 0.75
    elif connected: multiplier = 0.85
    else: multiplier = 1.0

    return int(math.ceil(float(blacks) * multiplier))

Naive solver

This generates a height two sequence, splits it into sections and wraps it round to fill the smallest possible square. Across clues are extended as far as possible to minimize blacks, but other than that there's not much cleverness. The solutions are never connected or proper and in practice never symmetric.

class XWord:

  def simple_solver(self):
    """Naive solver. Converts to a height 2 sequence and wraps round."""

    # generate a string sequence that we'll expand later
    # a=start across, b=start both, c=continue across, d=start down, _=optional space,  =space
    seq = []
    for a,d in zip(self.across, self.down):
      if a:
        if len(seq) > 0:
          if seq[-1] in "ab": seq.append("c")
          seq.append(" ")
        seq.append("a" if not d else "b")
      else:
        if len(seq) > 0:
          if seq[-1] in "bd": seq.append("_")
        seq.append("d")
    if len(seq) > 0 and seq[-1] in "ab": seq.append("c")

    # fit sequence into a square with grid size 3n-1 for minimal n
    n = 0; rows = seq
    while len(rows) > n:
      n += 1; x = 0; rows = []
      while x < len(seq):
        if seq[x] in "_ ": x+=1
        y = min(x+3*n-1,len(seq))
        if seq[y-1] in "ab": y-= 1
        rows.append("".join(seq[x:y]).rstrip(" _"))
        x = y

    # generate grid
    self.n = 3*n-1
    self.grid = [[XWord.B for i in range(self.n+2)] for j in range(self.n+2)]
    y = 1
    for row in rows:
      x = 1; across = False
      for s in row:
        if s in "abcd" or across and s in "_": self.grid[x][y] = XWord.W
        if s in "bd": self.grid[x][y+1] = XWord.W
        if s in "ab": across = True
        x += 1
      for x in range(x,self.n+1):
        if across: self.grid[x][y] = XWord.W
      y += 3
    return True

This runs almost instantaneously and generates an amusingly high score.

>>> solve_crosswords('crossword.txt', XWord.simple_solver,[1,300])
Crossword 1... done! (score = 121)

1 2#3 4#5 6 7
 # # # # # # #
##############
8#9#01#2 #3
 # ## #### ###
##############
4 #56#7 #8 9 0
#### #### # #
##############
1#2#3 #4 #56
 # ######## ##
##############
7 8 9#0 #1 #2
 # # #########

Crossword 2... done! (score = 192)
Crossword 3... done! (score = 203)
...
Crossword 299... done! (score = 666)
Crossword 300... done! (score = 645)

1 2#3 4 5 6#7 8#90 1 2 3#45 6 7
 # # # # # # # ## # # # ## # # #
################################
8 9#0 1 2#34#5 #6 #7 #8 9#0 #1
 # # # # ## ########## # #### ##
################################
2 #34#56#7 #89 0 1#23#45#6 #7
#### ## ##### # # ## ## # ######
################################
89#0 #1 2#34#5 #6 #7 #8 9#0 #1 2
# #### # ## ########## # #### #
################################
3#4#56 7 8 9 0 1 2 3 4#56 7 8#9
 # ## # # # # # # # # ## # # ###
################################
0 #1 #2 #3 #4 #5 6#7 #8 9#0 1 2
############### # #### # # # # #
################################
3 4#5 6#78#90 1 2#3 #4 #5 #6 #7
 # # # ## ## # # ############# #
################################
8 9#0 1 2 3 4#5 #6 #78#9 0 1 2
 # # # # # # ######## # # # # ##
################################
3 #45#67#89#0 #12#34 5#6 #7 #8
 ### ## ## ##### ## # ##########
################################
9 #01 2 3 4#5 6#7 #8 #9 #0 #1 2
#### # # # # # #### ######## # #
################################
34 5#6 #7 #8
# # ############################

Total score: 127634

Slow solver

The brute force solver traverses the grid square by square and decides at each point whether to start new clues, continue existing clues or block clues. Before continuing, it confirms that this choice is compatible with the current view of the grid. There are a number of other optimizations applied whenever we know that the grid is symmetricm, proper or connected.

  • Symmetric: we apply all updates symmetrically and check halfway that half of the across clues and at least half of the overall clues have been processed.
  • Proper: we check for properness whenver we update the grid.
  • Connected: we disallow a completely black row.

Note that there's even a bit of overoptimisation: we currently disallow proper crosswords that begin with a black row, though strictly these are allowed (though I'm not sure they should be). Also, we allow the user to specify the maximum number of black squares to look for to speed up the search.

class XWord:

  def slow_solver(self, n, symmetric=False, proper=False, connected=False, max_blacks=None):
    """Brute force solver with some optimizations. Too slow for much more than 10x10."""

    def _slow_solve(x, y, alist, dlist, updates, all_black_row, max_blacks):
      found = True

      # Early exits
      if connected and x == 1:
        if all_black_row: return False
        else: all_black_row = True
      if symmetric and x == 1:
        if y == self.n // 2 + 1:
          self.across_halfway = self.across.count(True) - alist.count(True)
          if alist.count(True) < self.across_halfway: return False
        if y == (self.n + 1) // 2 + 1:
          if 2 * len(alist) > self.num_clues: return False
          if alist.count(True) != self.across_halfway: return False

      # Apply updates and detect more inconsistencies
      undo = []
      for (i,j,C) in updates:
        # update square
        if self.grid[i][j] == C:
          pass
        elif self.grid[i][j] == XWord.U:
          self.grid[i][j] = C
          if C == XWord.B: max_blacks -= 1
          undo.append((i,j))
        else:
          found = False # inconsistent
          break
        if proper and C == XWord.W:
          if self.grid[i-1][j] == XWord.B and self.grid[i+1][j] == XWord.B or \
             self.grid[i][j-1] == XWord.B and self.grid[i][j+1] == XWord.B:
             found = False
             break
        elif C == XWord.W:
          if self.grid[i-1][j] == XWord.B and self.grid[i+1][j] == XWord.B and \
             self.grid[i][j-1] == XWord.B and self.grid[i][j+1] == XWord.B:
             found = False # lone black square
             break
        # update mirror image
        if symmetric and (i != self.n+1-i or j != self.n+1-j):
          if self.grid[self.n+1-i][self.n+1-j] == C:
            pass
          elif self.grid[self.n+1-i][self.n+1-j] == XWord.U:
            self.grid[self.n+1-i][self.n+1-j] = C
            if C == XWord.B: max_blacks -= 1
            undo.append((self.n+1-i,self.n+1-j))
          else:
            found = False
            break

      # Are we finished?
      if found:
        if y > self.n and len(alist) == 0: return True
        elif y > self.n or max_blacks < 0: found = False

      # Select next square type and recurse
      if found: 
        nextx = x + 1; nexty = y
        if nextx > self.n: nextx = 1; nexty += 1
        found = len(alist) > 0 and _slow_solve(nextx, nexty, alist[1:], dlist[1:], self._start_clue_updates(x, y, alist[0], dlist[0]), False, max_blacks)
        found |= _slow_solve(nextx, nexty, alist, dlist, self._continue_clue_updates(x, y), False, max_blacks)
        found |= _slow_solve(nextx, nexty, alist, dlist, self._block_clue_updates(x, y), all_black_row, max_blacks)

      # Undo updates on error
      if not found:
        for (i,j) in undo:
          self.grid[i][j] = XWord.U

      return found

    # in future, this should try a few values of n until we get one that works.
    self.n = n
    self.grid = [[XWord.U if (0<i<self.n+1) and (0<j<self.n+1) else XWord.B for j in range(self.n+2)] for i in range(self.n+2)]
    if max_blacks is None: 
      max_blacks = self.n * self.n - 2 * max(self.across.count(True), self.down.count(True))
    found = _slow_solve(1, 1, self.across, self.down, [], False, max_blacks)
    return found

  def _start_clue_updates(self, x, y, across, down):
    updates=[(x, y, XWord.W)]
    if across:
      updates.insert(0,(x-1, y, XWord.B))
      updates.append((x+1, y, XWord.W))
    elif self.grid[x-1][y] == XWord.B:
      updates.insert(0,(x+1, y, XWord.B))
    if down:
      updates.insert(0,(x, y-1, XWord.B))
      updates.append((x, y+1, XWord.W))
    elif self.grid[x][y-1] == XWord.B:
      updates.insert(0,(x, y+1, XWord.B))
    return updates

  def _continue_clue_updates(self, x, y):
    updates=[(x, y, XWord.W)]
    if self.grid[x-1][y] == XWord.B:
      updates.insert(0,(x+1, y, XWord.B))
      updates.append((x, y-1, XWord.W))
    elif self.grid[x][y-1] == XWord.B:
      updates.insert(0,(x, y+1, XWord.B))
      updates.append((x-1, y, XWord.W))
    return updates

  def _block_clue_updates(self, x, y):
    updates=[(x, y, XWord.B)]
    return updates

Unfortunately this is too slow to solve all but a few of the smaller crosswords in the puzzle file.

>>> xs = read_crosswords('crossword.txt')
>>> xs[0].slow_solver(13,True,True,True)
True
>>> xs[0].print_grid()
#12##########
3  4#56789###
0   1     ###
##2      ####
#3    #4 ####
#5    6  ####
#####7  #####
####8   9012#
####3 #4    #
####5 6    ##
###7       89
###0    #1
##########2 #
>>> xs[0].score()
18

Uri Granta

Posted 2014-10-27T08:14:13.327

Reputation: 2 675