honte, a go-playing program using neural nets

14
Honte, a Go-Playing Program Using Neural Nets Frederik Dahl

Upload: jud

Post on 05-Jan-2016

27 views

Category:

Documents


0 download

DESCRIPTION

Honte, a Go-Playing Program Using Neural Nets. Frederik Dahl. Combined approach. Supervised learning Shape evaluation Reinforcement learning Group safety Territory Heuristic evaluation Influence Search Capture Connectivity Life and death. Architecture. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Honte, a Go-Playing Program Using Neural Nets

Honte, a Go-Playing Program Using Neural Nets

Frederik Dahl

Page 2: Honte, a Go-Playing Program Using Neural Nets

Combined approach

Supervised learning Shape evaluation

Reinforcement learning Group safety Territory

Heuristic evaluation Influence

Search Capture Connectivity Life and death

Page 3: Honte, a Go-Playing Program Using Neural Nets

Architecture

Page 4: Honte, a Go-Playing Program Using Neural Nets

Shape evaluation: Multilayer perceptron190 inputs

Receptive field of radius 3 Distance to edge Liberties Captured stones

50 hidden nodesSingle output

Will an expert play here?

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 5: Honte, a Go-Playing Program Using Neural Nets

Shape evaluation:Training and performanceTrained on 400 expert games

Expert move used as positive example (+1) Random legal move as negative example (0)

Error backpropagation error = target - eval

Performance measured by treating prediction as evaluation function

What percentage of legal moves are ranked below the expert move?

Page 6: Honte, a Go-Playing Program Using Neural Nets

Shape evaluation:Results

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 7: Honte, a Go-Playing Program Using Neural Nets

Local search

Selective search for local goals Capture Connectivity Life and death

Only considers moves suggested by shape evaluating network Deep and narrow search Captures common-sense knowledge

Page 8: Honte, a Go-Playing Program Using Neural Nets

Group safety evaluation:Multilayer perceptronGroups defined by connectable blocks13 inputs

Number of stones in group Number of liberties in group Number of proven eyes Average opponent influence over liberties

20 hidden nodes1 output

Probability of group survival

Page 9: Honte, a Go-Playing Program Using Neural Nets

Group safety evaluation:Temporal difference learningTrained by self-playReward signal for the group is the average

final safety of stones 0 = captured 1 = survived

TD(0) is used, replaying games backwardsVery simple idea:

error = eval(next) - eval(now)

Page 10: Honte, a Go-Playing Program Using Neural Nets

Influence evaluation

Consider random walks from an intersection

How likely to end up at a black or white stone?

Can also take account of group safety estimates

Page 11: Honte, a Go-Playing Program Using Neural Nets

Territory evaluation

Another multilayer perceptron4 Inputs

Revised influence (for both sides) Distance from edge

10 hidden nodes1 output

Predicted territory valueTrained by TD(0) using eventual territory

value as reward signal

Page 12: Honte, a Go-Playing Program Using Neural Nets

Playing strength

Playing 19x19 Go Approximately even against Handtalk 97-06e Wins more than 50% against Ego 1.0

Weaknesses Confuses group safety with group strength Has no concept of the aji of a group

Page 13: Honte, a Go-Playing Program Using Neural Nets

Recent work

New version of WinHonte 1.03 Neural net to evaluate sente/gote

Trial version available online!

Page 14: Honte, a Go-Playing Program Using Neural Nets

Conclusions

Go knowledge can be learnedCombining different forms of knowledge

can be a good ideaMultilayer perceptrons provide a flexible

representationLocal search can be used effectively as

input features for learning