counting the score: position evaluation in computer gommueller/ps/goeval.pdf · counting the score:...

12
Counting the Score: Position Evaluation in Computer Go Martin M¨ uller Department of Computing Science, University of Alberta Edmonton, Canada T6G 2E8 [email protected] Abstract Position evaluation is a critical component of Go programs. This paper de- scribes both the exact and the heuristic methods for position evaluation that are used in the Go program Explorer, and outlines some requirements for developing better Go evaluation functions in the future. 1 Introduction The standard approach to developing game-playing programs is minimax search using a fast full-board evaluation. However, in many early Go programs, position evaluation played only a minor role. Programs selected their moves without ever computing a full-board evaluation. Moves were generated by very selective, goal-oriented move generators which used heuristics to approximate the value of a move. Lack of speed and quality problems were the two main reasons for deviating from the standard approach in the game of Go: full-board position evaluation is complicated and slow, so early programs did not have the computer resources needed to perform any lookahead. The quality of play also played a major role: the effect of many good moves in Go is hard to measure by a position evaluation function. Examples are creating “good shape” and many kinds of necessary defensive moves. Unless the evaluation function is incredibly sophisticated, and can reliably handle subtle changes in the strength of stones, it will miss theeffect of such moves. Typically, they do not increase the territorial score of a Go position by much, or might even seem to decrease it in case a move defends against a hidden threat. In the author’s experience [9], a purely territorial evaluation leads to “greedy” play and overextensions. As described by Chen [7], many programs use a mix of move evaluation and position evaluation. For example, a program based on position evaluation can give a bonus or a penalty to those types of moves that are otherwise misevaluated. Several surveys on computer Go [5, 6, 15, 20] have described the many components of current programs, with an emphasis on knowledge representation and search tech- niques. Papers dealing with specific position evaluation methods are [3, 7, 19, 21, 22]. This article describes the position evaluation methods used in the author’s program 1

Upload: others

Post on 10-Mar-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

CountingtheScore:PositionEvaluationinComputerGo

Martin MullerDepartmentof ComputingScience,Universityof Alberta

Edmonton,[email protected]

Abstract

Positionevaluationis a critical componentof Go programs. This paperde-scribesboth the exact and the heuristicmethodsfor positionevaluationthat areusedin theGo programExplorer, andoutlinessomerequirementsfor developingbetterGoevaluationfunctionsin thefuture.

1 Introduction

Thestandardapproachto developinggame-playingprogramsis minimaxsearchusinga fastfull-boardevaluation.However, in many earlyGoprograms,positionevaluationplayedonly a minor role. Programsselectedtheir moves without ever computingafull-board evaluation. Moves were generatedby very selective, goal-orientedmovegeneratorswhichusedheuristicstoapproximatethevalueof amove. Lackof speedandquality problemswerethetwo mainreasonsfor deviating from thestandardapproachin the gameof Go: full-board positionevaluationis complicatedandslow, so earlyprogramsdid not have thecomputerresourcesneededto performany lookahead.Thequality of play alsoplayeda majorrole: theeffect of many goodmovesin Go is hardto measureby a positionevaluationfunction.Examplesarecreating“goodshape”andmany kindsof necessarydefensivemoves.Unlesstheevaluationfunctionis incrediblysophisticated,andcanreliably handlesubtlechangesin the strengthof stones,it willmisstheeffect of suchmoves.Typically, they do not increasetheterritorial scoreof aGopositionby much,or mightevenseemto decreaseit in caseamove defendsagainsta hiddenthreat. In theauthor’s experience[9], a purely territorial evaluationleadsto“greedy”playandoverextensions.AsdescribedbyChen[7], many programsuseamixof move evaluationandpositionevaluation.For example,a programbasedon positionevaluationcangive a bonusor a penaltyto thosetypesof moves that areotherwisemisevaluated.

SeveralsurveysoncomputerGo[5, 6,15,20] havedescribedthemany componentsof currentprograms,with an emphasison knowledgerepresentationandsearchtech-niques.Papersdealingwith specificpositionevaluationmethodsare[3, 7, 19,21,22].This article describesthe position evaluationmethodsusedin the author’s program

1

Explorer. Therearetwo maindifferencesto previously describedapproaches:a mod-ularizedheuristicevaluationusing the conceptof zones, and an emphasison exactevaluationmethodsfor several typesof local situations.

Thestructureof thispaperis asfollows: Section2 describesthreeexactevaluationmethodsfor safeterritories,capturingracesandendgamesthat areusedin Explorer.Section3 dealswith Explorer’s zone-basedheuristicposition evaluation. Section4showshow a full-boardpositionevaluationis computedfrom thedifferentlocalevalu-ations,andSection5 discussesopenproblemsandfuturework onpositionevaluation.

2 Exact Evaluation in Explorer

Thissectiondescribesthreetypesof localsituationsfor whichExploreris ableto com-puteanexactevaluation:safeterritories,endgames,andcapturingraces(semeai).

2.1 Safe Territory

Figure1: Safestones,unsafeterritory.

Safeterritoriesare partsof a Go position that completelybelongto one player.They cannotbe reducedor destroyedby the opponentundernormal circumstances.Recognizingsafeterritoriesis usefulfor proving thatagameis over. It is alsorequiredfor exact mathematicalendgameanalysis,which startsby partitioning the boardintosmall independentendgameareasdividedby safeblocks[9, 11]. Safeterritoriesaregameswith a constantintegervalue.

Determiningthe safetyof a completelysurroundedterritory, and the stonessur-roundingit, is similar to solvingLife andDeathproblems.Onemaindifferenceis thatsafetyproofsfor largeareascannotuseanexhaustivesearchsincethestatespaceis toolarge. Anotherdifferenceis in the treatmentof coexistencein seki. While stonesaresafeif theopponentcannotcapturethem,territory is safeonly if it canbeproventhatno opponentstonescansurvive on the inside. Figure1, from [10], shows anexamplewheretheblackstonesaresafebut theareathatthey surroundis not. White1 threatensto capturethreeblack stones,so Black 2 is forced. After White 3, the final result iscoexistencein a seki,andtheblackterritoryhasbeendestroyed.

Benson[1] hasgivenamathematicalcharacterizationof unconditionallyaliveblocksof stones.Suchblockscannever becaptured,not evenby anarbitrarynumberof suc-cessive opponentmoves.For example,all blackblocksin Figure2 areunconditionally

2

Figure2: Benson’sunconditionalsafety.

Figure3: Proving thesafetyof blocksandterritories.

safe.All theblackstonesalwayshave at leasttwo adjacentemptysquares,andWhitecannotfill them without committing suicide. [10] introducedboth static rules andmethodsbasedon local searchfor detectinggroupsof blocksthat aresafeundertheusualalternatingplay, wherethedefenderis allowedto reply to theattacker’s moves.Thesetechniquescanbe usedto prove thesafetyof many moderatelylarge areas.Inwell-subdividedpositionsneartheendof agame,suchastheexampleshown in Figure3, every point on the boardcanquickly be proven safewith currenttechniques.Ex-plorerimplementsbothBenson’salgorithmandanenhancedversionof thetechniquesdescribedin [10].

2.2 Semeai - Capturing Races

Capturingraces(semeai)area typeof local Go situationswherevery specific,strongrulesfor evaluationandefficient searchexist [12, 14]. Explorercanexactly evaluatetwo kindsof capturingraces:

1. Semeaithat have alreadybeendecidedin oneplayer’s favor. The valueis an

3

Figure4: Semeaiwonby oneplayer, evaluatedstaticallyby Explorer.

Figure5: Semeaiwonby oneplayer, proventhroughsearchby Explorer.

integer, the sizeof the whole area,with the possibleexceptionof a numberofneutralpointson theoutside.Thesesemeaiareequivalentto safeterritories,butcannotberecognizedby theusualmethodsfor finding safeterritories.Explorercan find suchsemeaiby static rules, as in the examplesof Figure 4, or by asearch,asin theexamplegivenin Figure5.

2. For semeaiof simplestructure,namelythe classes0, 1 and2 definedin [12],Explorercancomputetheexactcombinatorialgamevalue.Many of thesesemeaivaluesare simple switches,while othersinvolve up to four different relevantoutcomes.SeeFigure6 for someexamples.

2.3 Endgame Areas

Combinatorialgametheoryprovidesarangeof endgameanalysistechniquesthatcom-putethe“bestpossible”evaluationatdifferentlevelsof precision.Many of thesetech-niquesare implementedin Explorer. For endgameswithout ko fights, the so-calledcanonicalform of a gamecanbecomputedby a local search[11]. This methodyieldscompleteinformationaboutgoodlocal plays.A morecompactrepresentationof localgamevaluesarethermographs.In asensethatcanbemademathematicallyprecise,themastvalueof a thermographis thebestpossiblestaticevaluationof anendgameareaby a singlenumber[2]. Similarly, the temperature of a thermographis a measureofurgency of playinga move in thatarea.

Thetheoryof thermographyhasbeenextendedby BerlekampandSpightto handle

4

Figure6: SemeaitestproblemsA, B andC, from [14].

Ko fights[2, 17,18]. Explorercancomputethethermographvaluesof local positionsinvolving a singleko at a time [13]. Bill Fraser’s programsGoSolverandBruteForcecananalyzemorecomplex ko situations[8].

Figure7: Dividers( � ) andpotentialdividers( � ) for Black andWhite.

3 Heuristic Territory Evaluation in Explorer

This sectiondescribestheterritory-relatedfeaturesof theprogramExplorer[9]. Theircombinationinto a full-boardscoreis discussedin Section4.

For evaluationpurposes,Explorerdistinguishesseveral different typesof points.The intermediatedatastructuresandfinal resultsof the computationare shown fora sampleposition from a recentcomputer-computergamein Figures7 to 11. Zones

5

Figure8: Safe(darkshade),potential(light shade),andthreatened( � ) zones.

Figure 9: Divider-adjusteddistancesfrom Black and White stones,with maximumdistance5.

(Figure8) arecomputedfrom blocksof stones,dividersandpotentialdividers(Figure7). Usingheuristicinformationsuchasthemapof connectiblepointsshown in Figure11, zonesareclassifiedassafeterritory, potential territory, threatenedzone, or un-usedzones. Pointsoutsideof territoriesarefurtherclassifiedinto nearpoints, junctionpoints, andfar-awaypoints, asshown in Figure10. Detailsof theprocessaredescribedbelow.

In Explorer, positionevaluationis performedin the laterstagesof the overall po-sition analysis.It requiresmostof the previously computedinformationasan input.Suchinformationincludestheresultsof tacticalanalysis,life anddeathsearches,andcomplex heuristicrulesto identify alive,weakanddeadgroupsof stones[9]. Positionevaluationitself is a four stepprocess:

1. Dividersandpotentialdividersarecomputedusingpatternmatching.Eachdi-vider or potentialdivider is a small objectwith attributessuchasits area,end-

6

Figure10: Near(shaded),junction( � ) andfar away ( � ) points.

Figure11: Connectiblepointsfor Black,White (shaded)andfor bothplayers( � ).

points, andstatus. The two endpointsof eachdivider areeitherstonesof thedivider’s color, or the edgeof the board. Figure7 shows summaryoverviewsof all pointscontainedin somedivider or potentialdivider, for Black and forWhite. Theheuristicstatusof groupsis usedin thisandfuturestepsfor filtering,to ignoredividersassociatedwith deadgroupsof stones.

2. Separatelyfor eachcolor, zonesandpotentialzonesarecomputednext, astheresultof a boardpartitioningprocessusingnon-deadstones,dividersand(onlyfor potentialzones)potentialdividersof thesamecolor. Usingfurtherheuristicinformationsuchastheconnectabilitymapshown in Figure11,asafetystatusofeachzoneis computed:Figure8 shows thesafe, potential, andthreatenedzonesfor bothplayers.

� Safe zonesare consideredsafe territory. They are surroundedonly bystonesand dividers,and the interior is strongly controlledby stones,as

7

measuredby a heuristic.� Potentialzonesare consideredpotentialterritory. They are weakerthan

safezones,eitherbecausepartof theboundaryis only a potentialdivider,or becausethecontrolover theinsideis not strongenough.An exampleisshown in thelower left cornerin theleft pictureof Figure8. Black’s largeareais surroundedby dividersandstones,but thecurrentheuristicjudgesthatthelargeinsideareais tooopento besafe,sotheareais only potentialterritory. This is a somewhatpessimisticjudgment.

� Zonesthatareevenweakerthanpotentialterritory areevaluatedasthreat-ened.For example,a largeareain thecenterright is looselysurroundedbyblackstones,but it hasa largenumberof weaknesses.

� Zonesthatcontainstrongopponentgroupsaremarkedasunused, sincetheplayerhasno scopefor developmentthere.An examplewould be thetopright cornerin Figure8. This wholecorneris surroundedon a large scaleby whitestones,dividerandpotentialdividers.However, it containsa safeblack cornergroup,so from White’s point of view this zoneis currentlyworthless.

Most points,except for zoneboundaries,arepart of both a black anda whitezone.

3. Next, divider-adjusteddistancesfrom both Black andWhite stonesarecalcu-lated, as shown in Figure 9. This distancefunction computesthe Manhattandistancefrom eachpoint to theneareststoneof thegivencolor. It stopsatoppo-nentstonesanddividers,andcomputesdistancesupto agivenmaximum,whichis setto 5 in thecurrentimplementation.

4. Givendistanceinformation,pointsthatarenot territoryor potentialterritory areclassifiedaseithernear to oneplayer, junctionpoints, or asfar awayfrom bothplayers.Theresultis shown in Figure10.

4 Full-board Evaluation

Explorer usesChineserules by default. For full-board evaluation, the exactly andheuristicallyevaluatedpartsof the boardare treateddifferently. For exactly evalu-atedparts,themeanvaluesof eachlocalpositionarecomputedandthenaddedup. Forsafeterritoriesanddecidedsemeai,this valueis identicalto thesizeof theterritory.

For theheuristicallyevaluatedrestof theboard,thenumberof pointsof eachtypeis computedfirst. Thecontribution to theoverall scoreis obtainedby multiplying thisnumberby a weight factorrangingfrom +1 for sureBlack pointsto -1 for sureWhitepoints.Theweightsarecurrentlysetasfollows:

1. safeterritory: eachpoint in a safeterritory countsas +1 for Black, or -1 forWhite, towardsthetotal score.

2. potentialterritory: eachpointcountsas������

.

8

3. Eachnearpoint is countedas�� ���

.

4. Junctionpointsandfar-awaypointshaveaweightof zero,they donotcontributeto thescore.

Thefinal scoreis thesumof theboardscoreplus thekomi. WhenusingJapaneserules,adjustmentsaremadefor handicapstones.

5 Open Problems and Future Work

Thissectiondescribessomeideasthatagoodevaluationfunctionfor Goshouldimple-ment,but thatarenot yet includedin Explorer.

5.1 Maximize the Chance of Winning, Not the Score

In point-scoringgamessuchasGo it is naturalto approximatetheexpectedfinal scorein the evaluationfunction. However, it is importantto takethe overall situationintoaccountto maximizethechanceof winning.

Figure12: Maximizing thescorevsmaximizingthechanceof winning.

Figure12 shows two examplepositions.In bothpositions,Blackcantry to kill thegroupat thetopwith move a, anall-out attackthathasareasonablechanceof success.Black’s alternative is to play a defensive move at b andlet White live. If Black playsa, White will cut at c and thrasharoundin Black’s area,trying to makethe groupalive. If White succeeds,Black will suffer a big loss. Black’s beststrategy dependson anassessmentof theoverall position. On the left, Black canwin easilyandsafelyby playing the defensive move at b. On theright, Black is behindin territory, so theattackat a is theonly chanceto win. No matterhow theunstablepositionsresultingaftermove a areevaluated,it is impossiblefor ascore-maximizingprogramto find thecorrectstrategy in bothcases.It is essentialto assessthewinningprobability instead.

The authorbelievesthat several Go programs,including Michael Reiss’Go4++,alreadyimplementsimilar ideas.Otherprogramsincludeat leastsomesimplerelatedstrategies,suchaschangingtheweightsof aggressive or defensivemovesaccordingtothescoreestimate.

9

5.2 Representing Evaluation Uncertainty

In gamessuchaschess,theideaof quiescencesearchis essentialfor improving evalu-ationquality. Positionsof high volatility shouldnotbeevaluated,but searcheddeeper.The sameprincipleholds for Go. For example,the global searchin David Fotland’sprogramTheManyFacesof Go mainly functionsasa quiescencesearch:only movesthatsimplify thestatusof weakgroups(save,kill) areconsideredin deeperstagesof thesearch.Bruno Bouzy [4] hasproposedto usefuzzy functionsto representevaluationuncertaintyin Go.

5.3 Threat Evaluation

Figure13: Proving thesafetyof blocksandterritories.

The heuristicsafetyevaluationof territoriesis basedon the assumptionthat theopponent’s moves can be answeredlocally. Definitions of individual territoriesareusually not robust with regard to doublethreats. Even if two regions areeachsafeby themselves, theremight be a move that threatensboth at the sametime. Figure13 shows anexample. Eachcorner, takenby itself, is a safeterritory for White. Theblackstonesinsidecanneitherlivenorconnectto theoutside.However, move 1 in thepictureon theright is a doublethreat.Black will beableto connectoneside,eitherata or at b, anddestroyoneof thecornerterritories.

The authorbelieves that several of the strongerGo programsperform at leastalimited kind of threatanalysis. However, theredo not seemto be any publicationsaboutsuchmethods.

5.4 Some Further Suggestions for Future Work on Evaluation inGo

Hereis an(incomplete)list of furtherresearchtopicsin Gopositionevaluation.

� Developmethodsto measurethequality of anevaluationfunction.

� For eachgamestage,developatestsetof full-boardpositionswith detailedeval-uationsprovidedby humanmasterplayers.

� Useterritorial evaluationasa secondarycriterionin goal-orientedsearches.Forexample,a Life andDeathsolver couldbemodifiedto alsomaximizethe terri-tory of thegroup,or to capturea groupon aslargea scaleaspossible.

10

� Furtherautomatemathematicalendgameanalysistechniquesandtheir applica-tion to computerGo.

� Develop smoothtransitionfunctionsbetweenexact and heuristicevaluations,thatcanusea mix of bothevaluationcomponents.

� Generalizethemethodsfor exactevaluationof territories,semeaiandendgames.

� Modify Life and Deathenginessuchas ThomasWolf ’s GoTools to generateexactevaluationsof local situations.

� Like doublethreats,ko fightsareanothercasewherenormalterritorydefinitionscanbeoverturned.In orderto win a ko, it maybenecessaryto allow theoppo-nenttwo movesin a row. Many otherwisesafestructurescollapseunderthesecircumstances.ConceptssuchasBenson’sunconditionallife, PopmaandAllis’X-Life [16], andTajima andSanechika’s PossibleOmissionNumber[19] mayform a basisto developmorerefinedevaluationshere.

References

[1] D.B. Benson. Life in the gameof Go. InformationSciences, 10:17–29,1976.Reprintedin ComputerGames,Levy, D.N.L. (Editor), Vol. II, pp. 203-213,SpringerVerlag,New York 1988.

[2] E. Berlekamp.Theeconomist’sview of combinatorialgames.In R.Nowakowski,editor, Gamesof No Chance: CombinatorialGamesat MSRI, pages365–405.CambridgeUniversityPress,1996.

[3] B. Bouzy. Modelisationcognitivedu joueurdeGo. PhDthesis,UniversiteParis6, 1995.

[4] B. Bouzy. Thereareno winning movesexcept the last. In ProceedingsIPMU,1996.

[5] B. Bouzy andT. Cazenave. ComputerGo: An AI-oriented survey. ArtificialIntelligence, 132(1):39–103, 2001.

[6] J.BurmeisterandJ.Wiles. AI techniquesusedin computerGo. In Fourth Con-ferenceof theAustralasianCognitiveScienceSociety, Newcastle,1997.

[7] K. Chen. Somepractical techniquesfor global searchin Go. ICGA Journal,23(2):67–74,2000.

[8] B. Fraser. Computer-AssistedThermographic Analysisof Go Endgames. PhDthesis,Universityof Californiaat Berkeley, forthcoming.

[9] M. Muller. ComputerGoasa Sumof LocalGames:AnApplicationof Combina-torial GameTheory. PhDthesis,ETH Zurich,1995.Diss.ETH Nr. 11.006.

11

[10] M. Muller. Playing it safe: Recognizingsecureterritoriesin computerGo byusingstaticrulesandsearch.In H. Matsubara,editor, GameProgrammingWork-shop in Japan ’97, pages80–86,ComputerShogi Association,Tokyo, Japan,1997.

[11] M. Muller. Decompositionsearch:A combinatorialgamesapproachto gametreesearch,with applicationsto solvingGo endgames.In IJCAI-99, pages578–583,1999.

[12] M. Muller. Raceto capture:Analyzing semeaiin Go. In GameProgrammingWorkshopin Japan’99, volume99(14)of IPSJSymposiumSeries, pages61–68,1999.

[13] M. Muller. Generalizedthermography:A new approachtoevaluationin computerGo. In J. vandenHerik andH. Iida, editors,Gamesin AI Research, pages203–219,Maastricht,2000.UniversiteitMaastricht.

[14] M. Muller. Partial orderbounding:A new approachto evaluationin gametreesearch.Artificial Intelligence, 129(1-2):279–311,2001.

[15] M. Muller. ComputerGo. Artificial Intelligence, 134(1–2):145–179,2002.

[16] R. PopmaandL.V. Allis. Life anddeathrefined. In H.J.vandenHerik andL.V.Allis, editors,HeuristicProgrammingin Artificial Intelligence3, pages157–164.Ellis Horwood,1992.

[17] W. Spight.Extendedthermographyfor multiplekosin Go. In H.J.vandenHerikandH. Iida, editors,ComputersandGames.ProceedingsCG’98, number1558inLectureNotesin ComputerScience,pages232–251.SpringerVerlag,1999.

[18] W. Spight. Go thermography- the 4/21/98 Jiang-Rui endgame. InR. Nowakowski,editor, MoreGamesof NoChance. CambridgeUniversityPress,2002.

[19] M. Tajima and N. Sanechika. Estimatingthe PossibleOmissionNumber forgroupsin Go by the numberof n-th dame. In H.J.van denHerik andH.Iida,editors,Computersand Games: ProceedingsCG’98, number1558 in LectureNotesin ComputerScience,pages265–281.SpringerVerlag,1999.

[20] B. Wilcox. ComputerGo. In D.N.L. Levy, editor, ComputerGames, volume2,pages94–135.Springer-Verlag,1988.

[21] S.-J.Yan,W.-J. Chen,andS.C.Hsu. Designandimplementationof a heuristicbeginninggamesystemfor computerGo. In JCIS’98, volume1, pages381–384,1998.

[22] S.-J.YanandS.C.Hsu. A positionaljudgmentsystemfor computerGo. In H.J.van denHerik andB. Monien, editors,Advancesin ComputerGames9, pages313–326,MaastrichtandPaderborn,2001.

12