learning shape in computer go

Learning Shape in Computer Go

David Silver

A brief introduction to Go

Black and white take turns to place down stones

Once played, a stone cannot move

The aim is to surround the most territory

Usually played on 19x19 board

Capturing

The lines radiating from a stone are called liberties

If a connected group of stones has all of its liberties removed then it is captured

Captured stones are removed from the board

Atari Go (Capture Go)

Atari Go is a simplified version of GoThe winner is the first player to captureOften used to teach Go to beginnersCircumvents several tricky issues

The game only finishing by agreement Ko (local repetitions of position) Seki (local stalemates)

Computer Go

Computer Go programs are very weak Search space is too large for brute force

techniques No good evaluation functions

Human intuition (shape knowledge) has proven difficult to capture.

Why not learn shape knowledge?And use it to learn an evaluation function?

Local shape

Local shape describes a pattern of stonesIt is used extensively by current Computer

Go programs (pattern databases)Inputting local shape by hand takes many

years of hard labourWe would like to:

Learn local shapes by trial and error Assign a value for the goodness of a shape Just how good is a particular shape?

Enumerating local shapes

In these experiments all possible local shapes are used as features

Up to a small maximum size (e.g. 2x2)A local shape is defined to be:

A particular configuration of stones At a canonical position on the board

Local shapes are used as binary features by the learning algorithm

Invariances

Each canonical local shape can be: Rotated Reflected Inverted

So each position may cause updates to multiple instances of each feature.

Algorithm

Value function is learnt for afterstatesMove selection is done by 1-ply greedy

search (ε = 0) over value function Active local shapes are identified Linear combination is taken Sigmoid squashing function is applied

Backups are performed using TD(0)Reward of +1 for winning, 0 for losing

Value function approximation

Training procedure

The challenge: Learn to beat the average liberty player

So learning algorithm was trained specifically against the average liberty player

The problem: learning is very slow, since the agent almost never wins any games by chance.

The solution: mix in a proportion of random moves until the agent wins 50% of all games.

Reduce the proportion of randomness as the agent learns to win more games.

Training procedure

The two pint challenge: Learn to beat the average liberty player

So learning algorithm was trained specifically against the average liberty player

The problem: learning is very slow, since the agent almost never wins any games by chance.

The solution: mix in a proportion of random moves until the agent wins 50% of all games.

Reduce the proportion of randomness as the agent learns to win more games.

Results for different shape sizes

0

20

40

60

80

100

120

010 21 31 42 52 63 73 84 94

105115126136147157168178189199210220231241Training games (thousands)

Percentage wins

1x1

2x1

2x2

3x3

3x3

0

10

20

30

40

50

60

70

80

010 19 29 39 49 58 68 78 88 97


Percentage random moves

1x1

2x1

2x2

3x2

3x3

Results for different board sizes

0

20

40

60

80

100

120

012 23 35 47 59 70 82 94


Percentage wins

5x5 board

6x6 board

7x7 board

Shapes learned (1x1)

Conclusions

Local shape information is sufficient to beat a naïve rule-based player

Significant shapes can be learnedThe ‘goodness’ of shapes can be learnedA linear threshold unit can provide a

reasonable evaluation functionEnumerating all local shapes reaches a

natural limit at 3x3Training methodology is crucial

Future work

Learn shapes selectively rather than enumerating all possible shapes

Learn shapes to answer specific questions Can black B4 be captured? Can white connect A2 to D5?

Learn non-local shape: Use connectivity relationships Build hierarchies of shapes

learning shape in computer go

Documents