honte, a go-playing program using neural nets
DESCRIPTION
Honte, a Go-Playing Program Using Neural Nets. Frederik Dahl. Combined approach. Supervised learning Shape evaluation Reinforcement learning Group safety Territory Heuristic evaluation Influence Search Capture Connectivity Life and death. Architecture. - PowerPoint PPT PresentationTRANSCRIPT
Honte, a Go-Playing Program Using Neural Nets
Frederik Dahl
Combined approach
Supervised learning Shape evaluation
Reinforcement learning Group safety Territory
Heuristic evaluation Influence
Search Capture Connectivity Life and death
Architecture
Shape evaluation: Multilayer perceptron190 inputs
Receptive field of radius 3 Distance to edge Liberties Captured stones
50 hidden nodesSingle output
Will an expert play here?
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Shape evaluation:Training and performanceTrained on 400 expert games
Expert move used as positive example (+1) Random legal move as negative example (0)
Error backpropagation error = target - eval
Performance measured by treating prediction as evaluation function
What percentage of legal moves are ranked below the expert move?
Shape evaluation:Results
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Local search
Selective search for local goals Capture Connectivity Life and death
Only considers moves suggested by shape evaluating network Deep and narrow search Captures common-sense knowledge
Group safety evaluation:Multilayer perceptronGroups defined by connectable blocks13 inputs
Number of stones in group Number of liberties in group Number of proven eyes Average opponent influence over liberties
20 hidden nodes1 output
Probability of group survival
Group safety evaluation:Temporal difference learningTrained by self-playReward signal for the group is the average
final safety of stones 0 = captured 1 = survived
TD(0) is used, replaying games backwardsVery simple idea:
error = eval(next) - eval(now)
Influence evaluation
Consider random walks from an intersection
How likely to end up at a black or white stone?
Can also take account of group safety estimates
Territory evaluation
Another multilayer perceptron4 Inputs
Revised influence (for both sides) Distance from edge
10 hidden nodes1 output
Predicted territory valueTrained by TD(0) using eventual territory
value as reward signal
Playing strength
Playing 19x19 Go Approximately even against Handtalk 97-06e Wins more than 50% against Ego 1.0
Weaknesses Confuses group safety with group strength Has no concept of the aji of a group
Recent work
New version of WinHonte 1.03 Neural net to evaluate sente/gote
Trial version available online!
Conclusions
Go knowledge can be learnedCombining different forms of knowledge
can be a good ideaMultilayer perceptrons provide a flexible
representationLocal search can be used effectively as
input features for learning