ML - Overview

Tic Tac Toe

A project to investigate different approaches in 'Machine Learning' applied to the task of learning the game of Tic-Tac-Toe. We will be using Neural Networks and Genetic Algorithms.

Normally ML solutions like Neural Nets begin with a large quantity of data that is pre-verified which can be used as a training set. In this project, however, the software will be learning by trial and error only.

This is still very much a 'work in progress'.

Defining Success

The 'fitness' of any solution is measured by playing 1,000,000 matches against an opponent that plays in a purely random fashion, and recording the % of matches that end in a win or draw. A fitness value of 100% (ie it never loses) should be possible from an appropriate ML approach.

Additionally there are a series of opponents with 'scripted behaviour' that can be used for more deeper analysis of the fitness of a solution. These 'Scripted Bots' are designated SBx, where x defines the level of sophistication of the scripting. SB0 has no scripting, has purely random behaviour, and is used for all fitness scoring.

Scripted Bot Fitness

Name	Scripting	Win/Draw Rate %	Win Rate %
SB1	Completes winning lines	74.2	66.7
SB2	Blocks opponent winning lines	96.2	79.3
SB3	Favours center square	98.8	89.1
SB4	Favours corners	99.4	90.4
SB5	A variation of SB4	99.3	92.0

Trainer 1

A single-layer Neural Net (NNv1) with 18 inputs, which uses a 'mutation algorithm' to create variations of the original Net. All variations are tested for fitness and the highest becomes the new 'best solution' and the source for the next generation of mutations.

An initial newly-created NNv1 would have a fitness of 40-60%, which after considerable training would peak at 50-70% fitness.

... leading to the following possibilities:

A single-layer NN with only 18 inputs cannot solve this problem.
Mutations are not sufficient to break out of the local fitness peak.

Trainer 2

A 2-layer Net (NNv2) with 27 inputs, using a back-propagation approach.

Trainer 3

A variant of 'Trainer 1' which begins with multiple competing Nets, allows for 'cross-breeding' among the competing Nets and will discard solutions that repeatedly have low fitness scores.

Trainer 4

A Genetic Algorithm, with its 'DNA' made up from a list of output moves from all possible game grids.

Trainer 5

A 'Brute Force' solution. Due to the smallness of the Tic-Tac-Toe 'problem' we should expect this to find a '100% Fitness' solution reasonably quickly.