Matchbox Educable Noughts and Crosses Engine

The Matchbox Educable Noughts and Crosses Engine (sometimes called the Machine Educable Noughts and Crosses Engine) or MENACE was an analogue computer made up of 304 matchboxes designed and built by Donald Michie in 1961. It was designed to play human opponents in games of noughts and crosses (also known as tic-tac-toe) by returning a move for any given state of play and to refine its strategy through reinforcement learning.[1]

A recreation of MENACE built by Matthew Scroggs.

Computer equipment was not readily available for non-government use, so Michie worked around this restriction by building it out of matchboxes.The matchboxes used by Michie each represented a single possible layout of a Noughts and Crosses grid. When the computer first played, it would randomly choose moves based on the current layout. As it played more games, through a reinforcement loop, it disqualified strategies that led to losing games, and supplemented strategies that led to winning games. Michie held a tournament against MENACE in 1961, wherein he experimented with different openings.

Following MENACE's maiden tournament against Michie, it was shown to be a successful computer. Michie's essays on MENACE's weight initialisation and the BOXES algorithm used by MENACE became popular in the field of computer science research. Michie was honoured for his contribution to machine learning research, and was twice commissioned to program a MENACE simulation on an actual computer.

Origin

Michie teaching a group of students at Turing Institute.

Donald Michie had been on the team decrypting the German Tunny Code during World War II.[2] Fifteen years later, he wanted to further display his mathematical and computational prowess with an early convolutional neural network. Since computer equipment was not available for such uses,[3] he decided to display and demonstrate artificial intelligence in a more esoteric format and constructed a working model out of matchboxes and beads.[1][4] (or beans)[5] He chose Noughts and Crosses as it was a simple game with fewer potential moves than other games.

MENACE "learned" by playing increasing matches of Noughts and Crosses. Each time, it would eliminate a losing strategy by the human player confiscating the beads that corresponded to each move. It reinforced winning strategies by making the moves more likely, by supplying extra beads. This was one of the earliest versions of the Reinforcement Loop, the schematic algorithm of looping the algorithm, dropping unsuccessful strategies until only the winning ones remain.[4] This model starts as completely random, and gradually trains.

Construction

MENACE was reportedly constructed as the result of a bet with a computer science colleague who postulated that such a machine was impossible.[6] Michie undertook the task of collecting and defining each matchbox as a 'fun project', later turned into a demonstration tool.[7]

Michie completed his essay on MENACE in 1963,[4] "Experiments on the mechanization of game-learning", as well as his essay on the BOXES Algorithm, written with R. A. Chambers[7] and by then had built up an AI research unit in Hope Park Square, Edinburgh.[8]

Composition

MENACE was made up of 304 matchboxes glued together in an arrangement similar to a chest of drawers.[9] Each box had a code number, which was keyed into a chart. This chart had drawings of tic-tac-toe game grids with various configurations of X's, O's and empty squares,[4] corresponding to all possible permutations a game could go through as it progressed.[10] After removing duplicate arrangements (ones that were simply rotations or mirror images of other configurations), MENACE used 304 permutations in its chart, and that many matchboxes.[11]

Each individual matchbox tray contained a collection of coloured beads.[12] Each colour represented a move on a square on the game grid, and so matchboxes with arrangements where positions on the grid were already taken would not have beads for that position. Additionally, at the front of the tray were two extra pieces of card in a "V" shape,[9] the point of the "V" pointing at the front of the matchbox.[10] Michie and his artificial intelligence team called MENACE's algorithm "Boxes",[8] after the apparatus used for the machine. The first stage "Boxes" operated in five phases, each setting a definition and a precedent for the rules of the algorithm in relation to the game.[13]

Operation

Playing a game

MENACE played first,[14][11] as O, since all matchboxes represented permutations that only one playing as O could ever see.

To retrieve MENACE's choice of move, the opponent or operator located the matchbox that matched the current game state, or a rotation or mirror image of it. For example, at the start of a game, this would be the matchbox for an empty grid. The tray would be removed and lightly shaken so as to move the beads around.[4] Then, the bead that had rolled into the point of the "V" shape at the front of the tray was the move MENACE had chosen to make.[4] Its colour was then used as the position to play on, and, after accounting for any rotations or flips needed based on the chosen matchbox configuration's relation to the current grid, the O would be placed on that square. Then the player performed their move, the new state was located, a new move selected, and so on, until the game was finished.[11]

Finishing a game

When the game had finished, the human player observed the game's outcome. As a game was played, each matchbox that was used for MENACE's turn had its tray returned to it slightly open, and the bead used kept aside, so that MENACE's choice of moves and the game states they belonged to were recorded. Once the game was done, if MENACE had won, it would then receive a "reward" for its victory. The removed beads showed the sequence of the winning moves.[15] These were returned to their respective trays, easily identifiable since they were slightly open, as well as three bonus beads of the same colour.[10] In this way, in future games MENACE would become more likely to repeat those winning moves, reinforcing winning strategies. If it lost, the removed beads were not returned, "punishing" MENACE, and meaning that in future it would be less likely, and eventually incapable if that colour of bead became absent, to repeat the moves that cause a loss.[5] If the game was a draw, one additional bead was added to each box.[10]

Results in practice

Optimal strategy

Optimal strategy for player X if starting in a corner. In each grid, the shaded red X denotes the optimal move, and the location of O's next move gives the next subgrid to examine.

Noughts and Crosses has a well-known optimal strategy.[16] It involves strategic placing to block the other player while simultaneously taking the win. However, if both players use this strategy, it always ends in a draw.[17] This creates a stagnation. If the human player is familiar with the optimal strategy, and the MENACE can quickly learn it, then the games will eventually only end in draws. When the computer begins and plays a random-playing opponent, it has the odds of the computer winning turn quickly in its favour.[1][7]

When playing against a player using optimal strategy, the odds of a draw grow to 100%. In Donald Michie's official tournament against MENACE, (1961)[4] he used optimal strategy, and he and the computer began to draw consistently after twenty games. Michie's tournament[18] had the following milestones: Michie began by consistently opening with "Variant 0", the middle square. At 15 games, MENACE abandoned all non-corner openings. At just over 20, Michie switched to consistently using "Variant 1", the bottom-right square. At 60, he returned to Variant 0. As he neared 80 games, he moved to "Variant 2", the top-middle. At 110, he switched to "Variant 3", the top right. At 135, he switched to "Variant 4", middle-right. At 190, he returned to Variant 1, and at 210, he returned to Variant 0.

The trend in changes of beads in the "2" boxes runs:[18]

VariantMatch numberBead change in "2" box
Variant 0 0 0
Variant 1 20 -5
Variant 0 60 5
Variant 2 70 10
Variant 3 110 20
Variant 4 135 25
Variant 1 190 100
Variant 0 210 120

Correlation

A scatter graph showing the results of Donald Michie's games against MENACE.

Depending on the strategy employed by the human player, MENACE produces a different trend on scatter graphs of wins.[4] Using a random turn from the human player results in an almost-perfect positive trend. Playing the optimal strategy returns a slightly slower increase.[1] The reinforcement does not create a perfect standard of wins; the algorithm will draw random uncertain conclusions each time. After the jth The correlation of near-perfect play runs:

Where Vi is the outcome (+1 is win, 0 is draw and -1 is loss) D is the Decay Factor (average of past values of wins and losses). Below, Mn is the multiplier for the nth round of the game.[4]

Outcome Reinforcement
Won
Draw
Lost

Legacy

Donald Michie's MENACE proved that a computer could "learn" from failure and success to become good at a task.[19] It also used what would become core principles within the field of machine learning before they had been properly theorised. For example, the combination of how MENACE starts with equal numbers of types of beads in each matchbox, and how these are then be selected at random, creates a learning behaviour similar to weight initialisation in modern artificial neural networks.[20]

In 1968, Donald Michie and R.A Chambers made another "BOXES"-based algorithm called GLEE, (Game Learning Expectimaxing Engine)[21] which was tasked to learn how to balance a pole on a cart.[22]

Computer simulation

After the resounding reception of MENACE, Michie was invited to the US Office of Naval Research, where he was commissioned to build a "Boxes"-running program for an IBM Computer for use at Stanford University.[23] Michie went on to create a simulation program of MENACE on a Pegasus 2 computer with the aid of D. Martin.[4]

Modern recreations

There have been multiple recreations of MENACE in more recent years, both in its original physical form and as a computer program.[11][24] Although not as a functional computer, in examples of demonstration, MENACE has been used as a teaching aid[25][26][27] for various neural network classes, including a well-publicised demonstration from Cambridge Researcher Matthew Scroggs.[28][29] A copy of MENACE built by Scroggs was featured in the 2019 Royal Institution Christmas Lectures.[30][31]


gollark: <@199585701547868160> Would you consider cubically claiming the Haskell logo? I'd like to run a tunnel under it.
gollark: Obviously, the number will have to be calculated at runtime, and indices will change dynamically.
gollark: Actually, no: we need to average it based on the number of people supporting each.
gollark: 0.5-indexed arrays.
gollark: How about a compromise?

See also

References

  1. "Menace: the Machine Educable Noughts And Crosses Engine". Chalkdust. 2016-03-13. Retrieved 2020-05-17.
  2. "Computer Pioneers - Donald Michie". history.computer.org. Retrieved 2020-07-19.
  3. Lectures Cultural Informatics Research Group
  4. "Experiments on the mechanization of Game Learning Part 1. Characterization of the model and its parameters" (PDF). Retrieved 2020-06-01.
  5. Sall, Matt (2019-03-25). "Teaching 304 Matchboxes To Beat You At Tic-Tac-Toe". Bell of Lost Souls. Retrieved 2020-07-14.
  6. "Daily Telegraph obituary for Donald Michie". Daily Telegraph. 2007-07-09.
  7. Donald, Michie. BOXES: An experiment in adaptive control. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.474.2430&rep=rep1&type=pdf: University of Edinburgh.CS1 maint: location (link)
  8. Muggleton, Stephen (2007-07-10). "Obituary for Donald Michie, an article in The Guardian from 2007". theguardian.com.
  9. The Science Book, Second Edition, Dorling Kindersley Ltd., 2015, pg. 288
  10. Gardner, Martin (1962). "MATHEMATICAL GAMES". Scientific American. 206, no.3: 17 via JSTOR.
  11. Matchbox Educable Noughts And Crosses Engine In Empirical Modelling
  12. core.ac.uk - The Machine Learning Revolution in AI by Luc De Raedt https://core.ac.uk/download/pdf/80808274.pdf
  13. Russel, David (2012). Springer Professional - Extract from "The BOXES Methodology",. https://www.springerprofessional.de/en/introduction-to-boxes-control/1967928: Springer London. ISBN 9781849965279.CS1 maint: location (link)
  14. "MENACE 2, an artificial intelligence made of wooden drawers and coloured beads". April 12, 2016.
  15. Regine (2016-04-12). "MENACE 2, an artificial intelligence made of wooden drawers and coloured beads". We Make Money Not Art. Retrieved 2020-07-14.
  16. "The best opening move in a game of tic-tac-toe - The Kitchen in the Zoo". blog.maxant.co.uk. Retrieved 2020-07-14.
  17. "Tic-Tac-Toe Strategy". Stephen Ostermiller. 2004-06-15. Retrieved 2020-05-17.
  18. Trial and Error , Michie Donald , Penguin Science Surveys 1961 Vol 2
  19. Dumas, Jacques-Pierre (Jp). "IoT and machine learning are driving network transformation". itbrief.com.au. Retrieved 2020-06-12.
  20. Yam, Jim Y. F.; Chow, Tommy W. S. (2000-01-01). "A weight initialization method for improving training speed in feedforward neural network". Neurocomputing. 30 (1): 219–232. doi:10.1016/S0925-2312(99)00127-7. ISSN 0925-2312.
  21. "1.6 History of Reinforcement Learning". incompleteideas.net. Retrieved 2020-08-01.
  22. Sutton, Richard S.; Barto, Andrew G. (2018-11-13). Reinforcement Learning: An Introduction. MIT Press. ISBN 978-0-262-03924-6.
  23. "Professor Donald Michie". 2007-07-08. ISSN 0307-1235. Retrieved 2020-06-11.
  24. Scaruffi, Piero (2016). Intelligence is not Artificial - Why the Singularity is not coming any time soon and other Meditations on the Post-Human Condition and the Future of Intelligence. https://www.scaruffi.com/singular/download.pdf. p. 30. ISBN 978-0-9765531-9-9.
  25. Zhao, Yibo (1 December 2013). "Machine Educable Engine on Noughts And Crosses in Modelling Study". University of Warwick.
  26. "AI Topics.. Tic-Tac-Toe strategy in Computational Thinking, Introduction, MENACE".
  27. Ute Schmid - "Interactive Learning with Mutual Explanations" (How Humans and Machine Learning Systems can Profit From Each Other) - University of Bamberg, Germanyhttps://www.universiteitleiden.nl/binaries/content/assets/science/dso/ute-schmid_mutualexplleiden-1.pdf
  28. Matthew Scroggs Lecture on MENACE - https://www.youtube.com/watch?v=hK25eXRaBdc
  29. "Inspiring the Next Generation of Computer Scientists | King's Worcester". King's Worcester. 2019-11-11. Retrieved 2020-06-12.
  30. Scroggs, Matthew (27 December 2019). "Visualising MENACE's learning". mscroggs.co.uk.
  31. https://twitter.com/ri_science/status/1210666302890754049

Sources

The BOXES Methodology, a book on the "Boxes" algorithm employed by MENACE.

BOXES: An Experiment in Adaptive Control, Michie and R.A Chambers' paper on the AI implications of BOXES and MENACE.

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.