learning shape in computer go

20
Learning Shape in Computer Go David Silver

Upload: fayola

Post on 08-Jan-2016

21 views

Category:

Documents


3 download

DESCRIPTION

Learning Shape in Computer Go. David Silver. A brief introduction to Go. Black and white take turns to place down stones Once played, a stone cannot move The aim is to surround the most territory Usually played on 19x19 board. Capturing. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Learning Shape in Computer Go

Learning Shape in Computer Go

David Silver

Page 2: Learning Shape in Computer Go

A brief introduction to Go

Black and white take turns to place down stones

Once played, a stone cannot move

The aim is to surround the most territory

Usually played on 19x19 board

Page 3: Learning Shape in Computer Go

Capturing

The lines radiating from a stone are called liberties

If a connected group of stones has all of its liberties removed then it is captured

Captured stones are removed from the board

Page 4: Learning Shape in Computer Go

Capturing

The lines radiating from a stone are called liberties

If a connected group of stones has all of its liberties removed then it is captured

Captured stones are removed from the board

Page 5: Learning Shape in Computer Go

Atari Go (Capture Go)

Atari Go is a simplified version of GoThe winner is the first player to captureOften used to teach Go to beginnersCircumvents several tricky issues

The game only finishing by agreement Ko (local repetitions of position) Seki (local stalemates)

Page 6: Learning Shape in Computer Go

Computer Go

Computer Go programs are very weak Search space is too large for brute force

techniques No good evaluation functions

Human intuition (shape knowledge) has proven difficult to capture.

Why not learn shape knowledge?And use it to learn an evaluation function?

Page 7: Learning Shape in Computer Go

Local shape

Local shape describes a pattern of stonesIt is used extensively by current Computer

Go programs (pattern databases)Inputting local shape by hand takes many

years of hard labourWe would like to:

Learn local shapes by trial and error Assign a value for the goodness of a shape Just how good is a particular shape?

Page 8: Learning Shape in Computer Go

Enumerating local shapes

In these experiments all possible local shapes are used as features

Up to a small maximum size (e.g. 2x2)A local shape is defined to be:

A particular configuration of stones At a canonical position on the board

Local shapes are used as binary features by the learning algorithm

Page 9: Learning Shape in Computer Go

Invariances

Each canonical local shape can be: Rotated Reflected Inverted

So each position may cause updates to multiple instances of each feature.

Page 10: Learning Shape in Computer Go

Algorithm

Value function is learnt for afterstatesMove selection is done by 1-ply greedy

search (ε = 0) over value function Active local shapes are identified Linear combination is taken Sigmoid squashing function is applied

Backups are performed using TD(0)Reward of +1 for winning, 0 for losing

Page 11: Learning Shape in Computer Go

Value function approximation

Page 12: Learning Shape in Computer Go

Training procedure

The challenge: Learn to beat the average liberty player

So learning algorithm was trained specifically against the average liberty player

The problem: learning is very slow, since the agent almost never wins any games by chance.

The solution: mix in a proportion of random moves until the agent wins 50% of all games.

Reduce the proportion of randomness as the agent learns to win more games.

Page 13: Learning Shape in Computer Go

Training procedure

The two pint challenge: Learn to beat the average liberty player

So learning algorithm was trained specifically against the average liberty player

The problem: learning is very slow, since the agent almost never wins any games by chance.

The solution: mix in a proportion of random moves until the agent wins 50% of all games.

Reduce the proportion of randomness as the agent learns to win more games.

Page 14: Learning Shape in Computer Go

Results for different shape sizes

0

20

40

60

80

100

120

010 21 31 42 52 63 73 84 94

105115126136147157168178189199210220231241Training games (thousands)

Percentage wins

1x1

2x1

2x2

3x3

3x3

0

10

20

30

40

50

60

70

80

010 19 29 39 49 58 68 78 88 97

107117127136146156166175185195205214224234244Training games (thousands)

Percentage random moves

1x1

2x1

2x2

3x2

3x3

Page 15: Learning Shape in Computer Go

Results for different board sizes

0

20

40

60

80

100

120

012 23 35 47 59 70 82 94

106117129141153164176188200211223235247Training games (thousands)

Percentage wins

5x5 board

6x6 board

7x7 board

Page 16: Learning Shape in Computer Go

Shapes learned (1x1)

Page 17: Learning Shape in Computer Go

Shapes learned (2x2)

Page 18: Learning Shape in Computer Go

Shapes learned (3x3)

Page 19: Learning Shape in Computer Go

Conclusions

Local shape information is sufficient to beat a naïve rule-based player

Significant shapes can be learnedThe ‘goodness’ of shapes can be learnedA linear threshold unit can provide a

reasonable evaluation functionEnumerating all local shapes reaches a

natural limit at 3x3Training methodology is crucial

Page 20: Learning Shape in Computer Go

Future work

Learn shapes selectively rather than enumerating all possible shapes

Learn shapes to answer specific questions Can black B4 be captured? Can white connect A2 to D5?

Learn non-local shape: Use connectivity relationships Build hierarchies of shapes