Implement MENACE

11

2

Background

MENACE (Machine Educable Noughts And Crosses Engine) is a rudimentary shallow machine learning algorithm for the game Noughts and Crosses, created by British computer scientist Donald Michie in the 1960s. It was originally implemented with 304 matchboxes, each labelled with a board position and containing coloured beads (with one of nine colours, representing possible moves). Michie calculated that these 304 matchboxes were enough for every combination of moves on the board.

The more mathematical among you may realise that there are actually 19,683 possible combinations of Noughts, Crosses and Blanks on a N&C board; however, he calculated ways to cut down on this number (to speed up the algorithm, and likely to cut down on matchboxes!). Firstly, he removed all none-possible moves, such as:

-------
|X|0|X|
| |0| |
|X|X| |
-------

(two noughts and four crosses)

Next, he compensated for rotations. For instance, if on the matchbox we see:

-------
| |0|0|
|X| |X|
| |0| |
-------

we can use the same box for

-------
| |X| |
|0| |0|
| |X|0|
-------

Therefore, the aforementioned coloured beads represent relative positions, not absolute ones. For instance, if we said that a red bead meant top-left, then we'd take a look at the image on the top of the box and see:

-------
| |0|0|
|X| |X|
| |0| |
-------

so we'd know that in the case that this is the board, then the red bead would mean:

-------
|R|0|0|
|X| |X|
| |0| |
-------

But if this is the board:

-------
| |X| |
|0| |0|
| |X|0|
-------

the red bead would mean

-------
| |X|R|
|0| |0|
| |X|0|
-------

These transformations applied for rotations and inverting (in all directions, including diagonal). Once again, you only need to save each matchbox once this way: do not make separate virtual boxes for each transformation!

Another simplification Michie made was to make sure the computer goes first. This way, he could remove all first-level moves, removing about a fifth of the remaining boxes. Finally, he removed all game-ending boxes (as no further 'contents' or moves were required on these steps).

Right, now onto the algorithm itself (it's very simple):

  1. First, decide on what the colours of the beads represent. You'll need 9 colours to represent each of the spaces on the board.
  2. At the start of the game, each of the 304 matchboxes contains beads. While the beads are of random colour (so duplicates are fine), they should be possible moves (so if the board state image depicts an 'O' in the middle-right, then you can't use the bead that represents the middle-right).
  3. Every time it is MENACE's (X) turn, find the matchbox with the current board position (or some transformation of it) printed onto it.
  4. Open the matchbox, and choose any bead in there at random.
  5. Find how the board status has been transformed to get to the image on the matchbox (e.g rotated 90deg anticlockwise). Then, apply that transformation to the bead (e.g top-left becomes left-left).
  6. Place an X in that square. Remove the selected bead from the matchbox. If the box is left empty as a result, put three random (possible) beads into the box, and pick one of them for the move.
  7. Repeat 3-6 until the game is over.
  8. If MENACE won the game, go back through every matchbox MENACE took. Then, trace back what colour bead it used on that move. Put two of that colour of bead into the box (so that there is the original bead + one more, thereby increasing the likelyhood of MENACE making that move next time it gets to that position)
  9. If MENACE lost the game, do nothing (don't replace the beads it took out).
  10. If MENACE drew the game, then replace the bead it used in each of its moves, but don't add an extra one, so that you're left with what you started.

This leaves us with an algorithm that is very simple, but difficult to implement. This forms the basis of your challenge.

If you're still confused, see http://chalkdustmagazine.com/features/menace-machine-educable-noughts-crosses-engine/ - it's what I read when I first learned about this algorithm

Challenge

Play a game of Tic-Tac-Toe with the computer. At each step, output the contents of all of the matchboxes.

Inputs

  • At the start of the program a number, saying how many games you want to play against MENACE
  • Then, after MENACE's first turn, you input your move as a two character string, the first letter being "L", "R", or "M" (left, right or middle) referring to the Y axis. Then you input another letter (again, "L", "R", or "M"), this time referring to the X axis. Repeat for all moves and games.

Outputs

  • At the start of each new game, output "new game".
  • After each move by the player, output the board in any reasonable format. It doesn't need to look pretty (e.g an array of arrays representing the positions of the board is fine).
  • After each move by the player, MENACE should make a move. Output the board after MENACE's move
  • After each game, output the contents of all 304 matchboxes. Beads can be represented by a letter, name of a colour, character or whatever string or integer you like (no pointers, anonymous functions, etc).

Rules

  1. This is , so shortest answer in bytes wins.
  2. I must be able to input moves after seeing MENACE's response. No 'pass all of your moves into this function, and watch how the game plays out'.
  3. The board must be cleared between games.
  4. The matchboxes must not be cleared between games (this would reset the machine learning)
  5. You must have 304 matchboxes. Anyone can implement this algorithm with all 19,683 matchboxes, but the learning is slow (as it requires lots of games to get all of them with useful contents).
  6. The output can be in any reasonable format, and input can be taken as per PPCG standards (as long as it complies with rule 2). If you need to adjust the input format (as described in the 'Input' section) then it's OK as long as it makes sense.
  7. A game ends when a player wins (by getting three in a row diagonally, horizontally or vertically) or if there is a draw (the board is full and there is no winner)
  8. While MENACE needs to make possible moves (and only have possible beads inside each matchbox), for the sake of the challenge you don't need to validate the input of the user. If they type in something wrong, your program can do whatever (go completely crazy, throw error, etc.) - you can assume that the input is correct.

Geza Kerecsenyi

Posted 2019-02-28T20:56:39.230

Reputation: 1 892

I remember Martin Gardner demonstrating the idea using the simpler game Hexapawn, although I forget what he named the "computer" that he constructed. – Neil – 2019-02-28T21:13:17.640

https://www.youtube.com/watch?v=R9c-_neaxeU – J42161217 – 2019-03-01T12:35:37.713

Great challenge. Couple of quick questions: 1. Once there is more than one bead in a given space in a box, how should that be represented in the output? 2. Do you really want all 304 boxes (2736 cells) output after each move? – Nick Kennedy – 2019-03-01T21:17:27.540

@NickKennedy Thanks for the feedback. The way I'd expect the beads to be represented when it's logged is as an array (though you can do it differently to not restrict different languages), e.g if you chose numbers to represent the beads: [[0, 2, 6], [4, 8, 4, 3, 3], [7, 7, 7, 7, 7, 7, 7, 8], [1], ... [3, 3, 5, 4]]. – Geza Kerecsenyi – 2019-03-01T22:33:31.730

The only reason I'd like to output the matchboxes is because I read in the 'Things to avoid in a challenge' Meta question that you shouldn't write 'Implement this algorithm' as a challenge in itself - let people figure out their own logic/adaptation of the problem. This way, the problem is 'When given these inputs, give an output following this logic'. However, I've edited the question to only output the matchboxes at the end of each game just to neaten the output a bit more. – Geza Kerecsenyi – 2019-03-01T22:33:36.373

Am I skimming a rule or is the number of beads at the beginning never defined anywhere? I see in the article it's the box representing Menace’s first turn had four beads for each different move. The boxes representing the layouts of the board for Menace’s second turn contained three beads for each different move; there were two beads each for Menace’s third; and one of each in the boxes representing Menace’s fourth go. Is this the case here? – Veskah – 2019-07-01T12:09:57.673

@Veskah no, it can have from 1-9 beads. – Geza Kerecsenyi – 2019-07-01T12:38:53.550

Answers

3

R, 839 bytes

options(max.print=1e5)
s=colSums
r=rowSums
m=matrix
a=array
y=apply
S=sum
p=sample
b=m(rep(i<-1:(K=3^9),e=9)%/%(E=3^(8:0))%%3,c(9,K))
V=a(1:9,c(3,3,8))
V[,,2:4]=c(V[x<-3:1,,1],V[,x,1],V[x,x,1])
V[,,5:8]=y(V[,,1:4],3,t)
d=aperm(a(b[c(V),],c(9,8,K)),c(1,3,2))
v=m(V,9)
g=y(m(match(e<-y(d*E,2:3,S),i),,8),1,min)
g[K]=K
G=9-y(t(e==g)*8:1,2,max)
h=s(a(c(b,d[,,5],b[c(1,5,9,3,5,7,1:3),]),c(3,3,K,3))*3^(0:2))
k=r(s(h==13))>0
l=r(s(h==26))>0
o=s(b>0)
M=b
M[M==0]=-1
repeat{A=b[,t<-K]
z=j=c();B=1
repeat{if(S(pmax(-M[,t],0))<1)M[p(9,pmin(3,S(x)),,x<-M[,t]<1),t]=-1
z=c(z,u<-p(9,1,,pmax(-M[,t],0)))
j=c(j,t)
A[v[,G[B]][u]]=1
print(m(A,3))
if(k[B<-S(A*E)]||o[B]==9)break
A[ceiling((utf8ToInt(readline())-76)/5)%*%c(1,3)+1]=2
if(l[B<-S(A*E)])break
t=g[B]}
M[x]=M[x<-cbind(z,j)]-k[B]+l[B]
print(a(M[,g==seq(g)&!k&!l&s(b==1)==s(b==2)&o<8],c(3,3,304)))}

Try it online!

Quite a long answer, but this wasn't a straightforward challenge. The TIO link here will fail because it expects interactive input. Here's a version that plays against a second, random player who just picks a spot at random. The output for this second version is just the winner (1, 2 or 0 for a draw.) Matchboxes are initialised for all board positions, but only used for the 304 per the spec. They're implemented as copies of the board with negative numbers to indicate the number of beads on each position. I experimented with a list of vectors per the original spec, but it was less intuitive.

This is a less golfed version with comments (but still short variable names). Note that it doesn't print the matchboxes out because they're very long. It can implement an interactive player 2, a random player 2 or the same matchbox strategy for player 2.

auto = 1 # 1 = Random player 2, 2 = Player 2 uses learning strategy
         # 0 for interactive
print_board <- function(board) {
  cat(apply(matrix(c(".", "X", "O")[board + 1], 3), 1, paste, collapse = ""), "", sep = "\n")
}
E = 3 ^ (8:0) # Number of possible arrangements of board
              # ignoring rotations etc.
# Make all possible boards
b = matrix(rep(1:3 ^ 9, e = 9) %/% E %% 3, c(9, 3 ^ 9))
# Define the eight possible rotation/inversion matrices
V = array(1:9, c(3, 3, 8))
V[, , 2:4] = c(V[x <- 3:1, , 1], V[, x, 1], V[x, x, 1])
V[, , 5:8] = apply(V[, , 1:4], 3, t)
# Create eight copies of the 19683 boards with each transformation
d = aperm(array(b[c(V), ], c(9, 8, 3 ^ 9)), c(1, 3, 2))
v = matrix(V, 9)
# Create reverse transformations (which are the same except for rotation)
w = v[, c(1:5, 7, 6, 8)]
# Find the sums of each transformation using base 3
e = apply(d * E, 2:3, sum)
# Find the lowest possible sum for each board's transformed versions
# This will be the one used for the matchboxes
f = matrix(match(e, 1:3 ^ 9), , 8)
g = apply(f, 1, min)
# Store which transformation was necessary to convert the lowest board
# into this one
G = 9 - apply(t(e == g) * 8:1, 2, max)
# Work out which boards have 3-in-a-row
h = colSums(array(c(b, d[, , 5], b[c(1, 5, 9, 3, 5, 7, 1:3), ]), c(3, 3, 3 ^ 9, 3)) * 3 ^ (0:2))
k = rowSums(colSums(h == 13)) > 0 # player 1 wins
l = rowSums(colSums(h == 26)) > 0 # player 2 wins
# Store how many cells are filled
o = colSums(b > 0)
# Create matchboxes. These contain the actual board configuration, but
# instead of zeroes for blanks have a minus number. This is initially -1,
# but will ultimately represent the number of beads for that spot on the
# board.
M = b
M[M == 0] = -1
repeat {
  # Initialise board and storage of moves and intermediate board positions
  A = b[, t <- 3 ^ 9]
  z = j = c()
  C = 1
  # If we're automating player 2 also, initialise its storage
  if (auto) {
    Z = J = c()
  }
  repeat {
    # If the current board's matchbox is empty, put up to three more beads
    # back in
    if (sum(pmax(-M[, t], 0)) == 0) {
      M[sample(9, pmin(3, sum(x)), , x <- M[, t] == 0), t] = -1
    }
    # Take out a bead from the matchbox
    u = sample(9, 1, , pmax(-M[, t], 0))
    # Mark the bead as taken out
    M[u, t] = M[u, t] + 1
    # Store the bead and board position in the chain for this game
    z = c(z, u)
    j = c(j, t)
    # Mark the spot on the board
    A[v[, C][u]] = 1
    # Print the board
    if (!auto) print_board(matrix(A, 3))
    # Check if  player 1 has won or board is full
    if (k[B <- sum(A * E)] || o[B] == 9) break
    if (auto) {
      # Repeat for player 2 if we're automating its moves
      # Note if auto == 1 then we pick at random
      # If auto == 2 we use the same algorithm as player 1
      D = g[B]
      if (sum(pmax(-M[, D], 0)) == 0) {
        M[sample(9, pmin(3, sum(x)), , x <- M[, D] == 0), D] = -1
      }
      U = sample(9, 1, , if (auto == 1) M[, D] <= 0 else pmax(-M[, D], 0))
      Z = c(Z, U)
      J = c(J, D)
      A[v[, G[B]][U]] = 2
    } else {
      cat(
        "Please enter move (LMR for top/middle/bottom row and\nLMR for left/middle/right column, e.g. MR:"
      )
      repeat {
        # Convert LMR into numbers
        q = ceiling((utf8ToInt(readline()) - 76) / 5)
        if (length(q) != 2)
          stop("Finished")
        if (all(q %in% 0:2) && A[q %*% c(1, 3) + 1] == 0) {
          break
        } else {
          message("Invalid input, please try again")
        }
      }
      A[q %*% c(1, 3) + 1] = 2
    }
    if (l[B <- sum(A * E)])
      break
    # Player 2 has won
    t = g[B]
    C = G[B]
  }
  if (auto) {
    cat(c("D", 1:2)[1 + k[B] + 2 * l[B]])
  } else {
    cat("Outcome:", c("Draw", sprintf("Player %d wins", 1:2))[1 + k[B] + 2 * l[B]], "\n")
  }
  # Add beads back to matchbox
  M[x] = M[x <- cbind(z, j)] - k[B] - 1 + l[B]
  if (auto)
    M[x] = M[x <- cbind(Z, J)] - l[B] - 1 + k[B]
}

Nick Kennedy

Posted 2019-02-28T20:56:39.230

Reputation: 11 829

Very clever! Of course, the rotations make it difficult, but thanks for adding the bot player as well! – Geza Kerecsenyi – 2019-03-07T08:08:30.860