honte, a go-playing program using neural nets

Post on 05-Jan-2016

27 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Honte, a Go-Playing Program Using Neural Nets. Frederik Dahl. Combined approach. Supervised learning Shape evaluation Reinforcement learning Group safety Territory Heuristic evaluation Influence Search Capture Connectivity Life and death. Architecture. - PowerPoint PPT Presentation

TRANSCRIPT

Honte, a Go-Playing Program Using Neural Nets

Frederik Dahl

Combined approach

Supervised learning Shape evaluation

Reinforcement learning Group safety Territory

Heuristic evaluation Influence

Search Capture Connectivity Life and death

Architecture

Shape evaluation: Multilayer perceptron190 inputs

Receptive field of radius 3 Distance to edge Liberties Captured stones

50 hidden nodesSingle output

Will an expert play here?

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Shape evaluation:Training and performanceTrained on 400 expert games

Expert move used as positive example (+1) Random legal move as negative example (0)

Error backpropagation error = target - eval

Performance measured by treating prediction as evaluation function

What percentage of legal moves are ranked below the expert move?

Shape evaluation:Results

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Local search

Selective search for local goals Capture Connectivity Life and death

Only considers moves suggested by shape evaluating network Deep and narrow search Captures common-sense knowledge

Group safety evaluation:Multilayer perceptronGroups defined by connectable blocks13 inputs

Number of stones in group Number of liberties in group Number of proven eyes Average opponent influence over liberties

20 hidden nodes1 output

Probability of group survival

Group safety evaluation:Temporal difference learningTrained by self-playReward signal for the group is the average

final safety of stones 0 = captured 1 = survived

TD(0) is used, replaying games backwardsVery simple idea:

error = eval(next) - eval(now)

Influence evaluation

Consider random walks from an intersection

How likely to end up at a black or white stone?

Can also take account of group safety estimates

Territory evaluation

Another multilayer perceptron4 Inputs

Revised influence (for both sides) Distance from edge

10 hidden nodes1 output

Predicted territory valueTrained by TD(0) using eventual territory

value as reward signal

Playing strength

Playing 19x19 Go Approximately even against Handtalk 97-06e Wins more than 50% against Ego 1.0

Weaknesses Confuses group safety with group strength Has no concept of the aji of a group

Recent work

New version of WinHonte 1.03 Neural net to evaluate sente/gote

Trial version available online!

Conclusions

Go knowledge can be learnedCombining different forms of knowledge

can be a good ideaMultilayer perceptrons provide a flexible

representationLocal search can be used effectively as

input features for learning

top related