Background

MENACE (Machine Educable Noughts And Crosses Engine) is a rudimentary shallow machine learning algorithm for the game Noughts and Crosses, created by British computer scientist Donald Michie in the 1960s. It was originally implemented with 304 matchboxes, each labelled with a board position and containing coloured beads (with one of nine colours, representing possible moves). Michie calculated that these 304 matchboxes were enough for every combination of moves on the board.

The more mathematical among you may realise that there are actually 19,683 possible combinations of Noughts, Crosses and Blanks on a N&C board; however, he calculated ways to cut down on this number (to speed up the algorithm, and likely to cut down on matchboxes!). Firstly, he removed all none-possible moves, such as:

-------
|X|0|X|
| |0| |
|X|X| |
-------

(two noughts and four crosses)

Next, he compensated for rotations. For instance, if on the matchbox we see:

-------
| |0|0|
|X| |X|
| |0| |
-------

we can use the same box for

-------
| |X| |
|0| |0|
| |X|0|
-------

Therefore, the aforementioned coloured beads represent relative positions, not absolute ones. For instance, if we said that a red bead meant top-left, then we'd take a look at the image on the top of the box and see:

-------
| |0|0|
|X| |X|
| |0| |
-------

so we'd know that in the case that this is the board, then the red bead would mean:

-------
|R|0|0|
|X| |X|
| |0| |
-------

But if this is the board:

-------
| |X| |
|0| |0|
| |X|0|
-------

the red bead would mean

-------
| |X|R|
|0| |0|
| |X|0|
-------

These transformations applied for rotations and inverting (in all directions, including diagonal). Once again, you only need to save each matchbox once this way: do not make separate virtual boxes for each transformation!

Another simplification Michie made was to make sure the computer goes first. This way, he could remove all first-level moves, removing about a fifth of the remaining boxes. Finally, he removed all game-ending boxes (as no further 'contents' or moves were required on these steps).

Right, now onto the algorithm itself (it's very simple):

First, decide on what the colours of the beads represent. You'll need 9 colours to represent each of the spaces on the board.
At the start of the game, each of the 304 matchboxes contains beads. While the beads are of random colour (so duplicates are fine), they should be possible moves (so if the board state image depicts an 'O' in the middle-right, then you can't use the bead that represents the middle-right).
Every time it is MENACE's (X) turn, find the matchbox with the current board position (or some transformation of it) printed onto it.
Open the matchbox, and choose any bead in there at random.
Find how the board status has been transformed to get to the image on the matchbox (e.g rotated 90deg anticlockwise). Then, apply that transformation to the bead (e.g top-left becomes left-left).
Place an X in that square. Remove the selected bead from the matchbox. If the box is left empty as a result, put three random (possible) beads into the box, and pick one of them for the move.
Repeat 3-6 until the game is over.
If MENACE won the game, go back through every matchbox MENACE took. Then, trace back what colour bead it used on that move. Put two of that colour of bead into the box (so that there is the original bead + one more, thereby increasing the likelyhood of MENACE making that move next time it gets to that position)
If MENACE lost the game, do nothing (don't replace the beads it took out).
If MENACE drew the game, then replace the bead it used in each of its moves, but don't add an extra one, so that you're left with what you started.

This leaves us with an algorithm that is very simple, but difficult to implement. This forms the basis of your challenge.

If you're still confused, see http://chalkdustmagazine.com/features/menace-machine-educable-noughts-crosses-engine/ - it's what I read when I first learned about this algorithm

Challenge

Play a game of Tic-Tac-Toe with the computer. At each step, output the contents of all of the matchboxes.

Inputs

At the start of the program a number, saying how many games you want to play against MENACE
Then, after MENACE's first turn, you input your move as a two character string, the first letter being "L", "R", or "M" (left, right or middle) referring to the Y axis. Then you input another letter (again, "L", "R", or "M"), this time referring to the X axis. Repeat for all moves and games.

Outputs

At the start of each new game, output "new game".
After each move by the player, output the board in any reasonable format. It doesn't need to look pretty (e.g an array of arrays representing the positions of the board is fine).
After each move by the player, MENACE should make a move. Output the board after MENACE's move
After each game, output the contents of all 304 matchboxes. Beads can be represented by a letter, name of a colour, character or whatever string or integer you like (no pointers, anonymous functions, etc).

Rules

This is code-golf, so shortest answer in bytes wins.
I must be able to input moves after seeing MENACE's response. No 'pass all of your moves into this function, and watch how the game plays out'.
The board must be cleared between games.
The matchboxes must not be cleared between games (this would reset the machine learning)
You must have 304 matchboxes. Anyone can implement this algorithm with all 19,683 matchboxes, but the learning is slow (as it requires lots of games to get all of them with useful contents).
The output can be in any reasonable format, and input can be taken as per PPCG standards (as long as it complies with rule 2). If you need to adjust the input format (as described in the 'Input' section) then it's OK as long as it makes sense.
A game ends when a player wins (by getting three in a row diagonally, horizontally or vertically) or if there is a draw (the board is full and there is no winner)
While MENACE needs to make possible moves (and only have possible beads inside each matchbox), for the sake of the challenge you don't need to validate the input of the user. If they type in something wrong, your program can do whatever (go completely crazy, throw error, etc.) - you can assume that the input is correct.

Geza Kerecsenyi

Posted 2019-02-28T20:56:39.230

Reputation: 1 892

I remember Martin Gardner demonstrating the idea using the simpler game Hexapawn, although I forget what he named the "computer" that he constructed. – Neil – 2019-02-28T21:13:17.640

@Neil https://personalpages.manchester.ac.uk/staff/Andrew.Hazel/learning_machine/learning_machine.html maybe?

– Geza Kerecsenyi – 2019-02-28T21:51:31.513

https://www.youtube.com/watch?v=R9c-_neaxeU – J42161217 – 2019-03-01T12:35:37.713

Great challenge. Couple of quick questions: 1. Once there is more than one bead in a given space in a box, how should that be represented in the output? 2. Do you really want all 304 boxes (2736 cells) output after each move? – Nick Kennedy – 2019-03-01T21:17:27.540

@NickKennedy Thanks for the feedback. The way I'd expect the beads to be represented when it's logged is as an array (though you can do it differently to not restrict different languages), e.g if you chose numbers to represent the beads: [[0, 2, 6], [4, 8, 4, 3, 3], [7, 7, 7, 7, 7, 7, 7, 8], [1], ... [3, 3, 5, 4]]. – Geza Kerecsenyi – 2019-03-01T22:33:31.730

The only reason I'd like to output the matchboxes is because I read in the 'Things to avoid in a challenge' Meta question that you shouldn't write 'Implement this algorithm' as a challenge in itself - let people figure out their own logic/adaptation of the problem. This way, the problem is 'When given these inputs, give an output following this logic'. However, I've edited the question to only output the matchboxes at the end of each game just to neaten the output a bit more. – Geza Kerecsenyi – 2019-03-01T22:33:36.373

Am I skimming a rule or is the number of beads at the beginning never defined anywhere? I see in the article it's the box representing Menace’s first turn had four beads for each different move. The boxes representing the layouts of the board for Menace’s second turn contained three beads for each different move; there were two beads each for Menace’s third; and one of each in the boxes representing Menace’s fourth go. Is this the case here? – Veskah – 2019-07-01T12:09:57.673

@Veskah no, it can have from 1-9 beads. – Geza Kerecsenyi – 2019-07-01T12:38:53.550

Implement MENACE

Background

Challenge

Inputs

Outputs

Rules

Answers

R, 839 bytes