-4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science
Graclrrate Department of Electrical anci Cornputer Engiiieeririg University of Toronto
National Library Bibliothèque nationale du Canada
Acquisitions and Acquisitions et Bibliographie Services services bibliographiques 395 Wellington Street 395. nie Wellington Ottawa O N KtA O N 4 Ottawa ON KI A ON4 Canada Canada
The author has granted a non- exclusive licence allowing the National Library of Canada to reproduce, loan, distnhte or sen copies of this thesis in microform, paper or electronic formats.
The author retains ownership of the copyright in this thesis. Neither the thesis nor substantial extracts fiom it may be printed or othenvise reproduced without the author's permission.
L'auteur a accordé une licence non exclusive permettant à la Bibliothèque nationale du Canada de reproduire, prêter, distribuer ou vendre des copies de cette thése sous la forme de microfiche/film, de reproduction sur papier ou sur format électronique.
L'auteur conserve la propriété du droit d'auteur qui protège cette thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.
Abstract
Parallel Alpha-Beta Secuch on S haret! hIernor3- 5lult iprocessors
Valavan Mcmohararajali
Alaster of Appliecl Science
Gracluate Depart ment of Elect rical ancl Cornputer Er-igineering
University of Toronto
2001
The alpha-beta algorit hm is a well knon-n methocl for the secluential search of gcme t rees. Two
methods. yoii~ig brothers wait concept ancl d p a m i c tree splitting liave been usecl sirccessf~illy
in parallel game tree search. First. this work introduces the notion of an esponentially orclerecl
gamc tree as a mode1 for the game trees encouritereci in practice. Second. esponentially orclerecl
trees are iised in the stiitly of the tree splitting rriethocls i.isecl bj- y i ~ x l g brotliers wait concept
21nc.l ciyna~nic tree splitting. Finally a ncw tree splitting nietliocl baseci on neiirnl iletworks is
iritrodirced m c l is fourid to outperform the otlicir two rnethods on certain types of trees.
Dedicat ion
To ml- parents. Amrna ancl Appa-
Acknowledgement s
First. I wonlcl like to thCuil; m>- supervisor. Professor 2. G. Vranesic. for his advice. guidance and
support. His continua1 encouragement when 1 vas facecl wit t i teclmicd difficulties Lielpecl me
push throtigh m c l cliscover the right solutions. He \Xri\-j d w ~ s willing to set asicle any amoiint
of cinie for a ctiscussion oti thesis matters.
1 am gratefd to both OGS nricl NSERC for provïcling the means CO F~irther rny eclrication.
Last. but certainly not Lest, I would like CO thank niy wXe. Abiramy. for her love and
support cluring the course of this thesis.
4 Neural Network Structure and Back-Propagation . . . . . . . . . . . . . . . . . . .5 0
.5 .5 Performance of Node Classification Schemes . . . . . . . . . . . . . . . . . . . . . .52
6 Experirnents on a Paralle1 Alpha-Beta Simulator 58
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1 Introcluction .5Y
6.2 A Siniplifieci S hared hlernory hlultiprocessor . . . . . . . . . . . . . . . . . . . . . 58
6.3 SIukiprocessor Sini~rlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6-4 The Parallel -Alplia-Beta Simulator (PABSim) . . . . . . . . . . . . . . . . . . . . 63
6.5 Split-Point Selection Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3
6.G Performance of Spiit-Point Selection Scliernes . . . . . . . . . . . . . . . . . . . . 74
7 Conclusions and Future Work 86 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . 1 Concliisions 86
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 FutureWork 86
A Seeds Used For Artificial Tree Generation 87
A-1 Set l . tg. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
-42 Set 2 . tg., . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S i
Bibliograpby 87
Chapter 1
Introduction
The seminal paper bj- Clairclc E. Shannon [Z] incroclucec1 tlie concept of game trees dong wit
a simple algorithni (hli1iih1a.s) for searching tliern. Shannon mas pr im~ir i ly interescecl in creating
a computer player for the game of chess. Here we consider the t rees t hat arise frorn a purticidar
c l a s of gzllmes. Eacli garne in this class 1icm the follotving properties:
T h e are txo players involvecl in the game: player 1 and pla>-er 2.
The tnro players take tiirns making rnoves. At anj- position i n the ganie. a finite nunrber
of rnoves are asmilable to the p l v e r on niove.
TIie game is cleterrninistic - cliere are no elements of cliance in the game.
It is a gume of perfect information. That is. botli players krion- die entire stacc of the
ganie at 21.11 cinies. For esample. chess is a game of perfect information. For clic cl~zration
of tlie gnnie. botti playcrs know the board position. Hot-ex-er. 2-player poker is not a gaule
of perfect information, Although. player 1 c m see his carcls. Ile caririot see thc carcls that
player 2 liolcts.
There are tliree possible outcornes in the game: a win for play-er 1, a win for p lver 2 or a
drmv. Garnes that do not encl in a draw are dso incl~iclecl in t- lie class. These garnes have
two possible o~ttcornes: a win for player 1 or a \vin for player 2.
The construction and the siibsequent search of game trees fornis t l i e basis of matiy conipi.iter
progams designecl to pli1.y two player strategy games. -4 game tree is a wa4- of represe~iting the
possibilities that are available to the players involved in the game- The search of the game tree
yielc1s the optimal secluence of play for both sicles.
Efficient dgorithms for the sequential searcli of game trees h a v e been in existence for a
long time (since 1963 [12]). In a tree wliere good moves are searchecl before bad ones, a good
sequeritial tree search algorithm will esamirie a fraction of the entire tree. On a parallel systern,
secluential dgorichms arc estendecl to czllow several processors to share thc searcli effort. Initial
attempts at parallel searcli had limitect siiccess on a largc nirrnber of processors El71 but morc
recent efforts lia\~e dernonstratecl t hat a large number of processors can be usecf effect ivel5- [SI.
One of the biggest clifficulties in paraIlel searcti is in cleterrziining d i e r e niriltipIe processors cari
be irsed to split rrp the scarch effort. The main fociis of this work is to espiore alternative n . q s
of clioosirig where to split the t ree. A corciparison of the tree split ting techniclries in ciment rise
wu not previously acciilable. This work produces sirch a cornparison iising artificially gcneratecl
trees. .A new tree splitting teclinicliie is also introcl~icecl anci its performance is coniparecl to
previously esisting technicpes.
Chaptcr 3 explores tlie different algorithrns available for secliiential tree scarch. Three ap-
proaches to parallel search are csarninecl in cliapter 3 . Chapter 4 introciuces tlie concept of
artificially gcneratecl trees. -4 new nocle classification teclinique is iritroclucecl in ctiapter 5 ancl
its performance is comparecl to e-xïsting techniques iising a sequcntial tree searclier. Chapter 6
stuclies the perfomiance of three tree splitting teclinicl~~cs within a parallel tree searchcr.
Chapter 2
Game Tree Search
Introduction
A simple tree for a two-ph>-cr game is presenteci in Figure 2-1- -4 node in tlic tree rcpresents
a position in the ganie while a brandi represents a move availablc a t u partictilar position.
Player 1 is on move a t nocles with tlic rectangular graphic ancl plq-er 2 is on rnove at nodes
n-itli a circle graptiic. For esample. a t the Root node. player 1 is on movc aricl tlie plaxer ticas
two moves a\-ailable: a and 6. Eacli leaf nocle lias been assignecl a score that indicates lion.-
valuable that position is. A positive score iriclicates tl-iac playcr 1 is winning wliile a ncgative
score inclicates ttiat p lc~er 2 is 11-inning: a score of O iriclicates a clraw. The magnitude of the
score conveys important infortnation as n~cll- Higl-icr the score. more fa\-orable a position is for
player 1. Sirnilarly. lower thc scorc. more FavorubIc a position is for plq-er 2. Xote tllat tliis
scoring schenie is cirbitrary ancl there are sevcrd possible schemes tliat cari l x uscd as long as
the scoring scheme àlloms one to clistingiiish wliich of the two players is winnitig.
1;Vhile the above clisciission involves a tree for a two-plqer garne, a game tree can b e
constructecl for a large clciss of problems that cloesn't involve garnes. -411 that a gcurie tree
requires is tliat tliere be tn-O opposing forces a n d that they occupy alternute levels in the tree.
The probiern in game trcc search is to fincl the game tree value. The i-alue of a game tree
is the score of the leaf node that is reachecl wlien both sides exercise their best options. Froni
a. practical viewpoint. mhat one really needs to fincl is the option at the root tlint leacls to the
game tree value. In the case of the tree in Figure 2.1. the best option is tlie rnove that gives
p l a ~ e r 1 the best chance of winning ,zssiiming that both players are plagng perfecttlj: Keeping
track of the path that leads to the game tree value is a trivial enhancement to an algorithm
that determines ttiis vdiie. In the discussions that follow. for simplicits it is assurneci that the
discovery of the g,me tree value is ecpivalent to the discovcry of the patli thût leads to that
value.
Figure 2.1: A simple garne tree.
Consider tlie problem of fincling the gzznie tree vallie for the tree i r i Figure 3-1. Iclecilly.
player 1 iGmts to tlie game t o ~ ~ v c l s position T bccause tliat lias the highest score frorn
his standpoint. Assrirne t h plaj-er 1 plays rnove b in ordcr to s ta r t rnaking progress ton-arcl
position T. As fûr as player 2 is conceniecl. rriove f pIays ri& irito p l v e r 1's linricls. Plaxer 2
obtains a better position by p l ayhg move y. If p lve r 2 chooses rriove g t h i playcr 1 fo1lon.s
it up witli move u and the final posit;iori. Cr. 1 ~ 2 ~ a score of -1. Tlius. plq-er 1-s original plan
of guiding the game towarcls T can bc easily thwartccl bu a carehil plq-er 2. Consiclcr wiiat
happens whcn player 1 chooses rnove a. Xow. if player 2 pics c tlicn pl*-er 1 cliooscs i and n-c
obtain a score of 7. Hoivever. if p l v e r 2 plays d tlien plq-er 1 cliooses I ancl the firial position
bas a scorc of 4. .A similar analysis on e slio~vs tliat it lcacls to a score of -5. Clearly. player 2
slioulcl clioose d. So tlic secluerice of rnoves. assurriirig perfect plq- bj- botli sides. is u. cl. 1. This
scquencc n-il1 be referred to as t h e principal variation. The terzn principal variation is used to
describe the scquence of moves tliat lend to tlie game tree valrie. For t lie t ree in Figure 2.1, the
game trce d i r e is 4.
2.2 Definitions
A11 ulgoritliins clescribed in this ivork expanci the branches at a node in a left to ri& orcler. A
branch or cliilcl nocle: m. is said to corne before another branch or dii1cl nodc, n. iF 7n is to tlie
left of n. A branch or diild node i s said to he the first a t a node if it is in the leftrnost position.
In addition to the search orcler defined above, tlie algorithms search the trce in a cleptli
first mmner. Deptli first tecliniclues for game tree search recjuire very litde memory ancl the
meriiory requirement cloes no t ~ T O W eesponent ially wi t li t lle t ree size-
1: if node-depth = O then
2: return EC-.-\LC--\TE(~O~~)
3: if node-type = r n m then
4: score - -X
5: else
6: score - fax - : for i - 1 t o nocle.branch.length.
8: nexw-node - TR. - \ v -ERSE(~O~~ , nocle.branch[ij)
9: value - AIr~rhl--\s(ne.ru-node)
10: if node-type = maz t h e n
Il: if ualue > score then
12: score - unlue
13: else
14: if value < score t hen
1.5: score - value
16: return score
Figure 2.2: The mini-max algoritfim.
The Mini-Max Algorithm
In the tree of Figure '2-1. at the nocles wliere pIayer 1 is on move. player 1 will select the move
tliat mi~xïniizes his/her score. Similarly. the iiocles wliere player 2 is on rnove. pli'~~-eï 2 will
select the rnow that rninirnizes his/her score. Thus. we can classify tlie nocles of a game tree
as being one of two types: maximizing or rninimizing. This observation Leads directly to the
mini-tnax algoritlini in Figure 2 2 .
Deperiding on n4ietlier a node is maxirnizitig or rninimizing. the algoritlirn keeps track of
the largest or the srnallest score. respectively A Aed nocle is rcuclied when the rerriaining deptli
(node-depth) is equal to zero. At a leaf nocle. the EVALUATE F~mction is c;dlecl to clctcrmine the
score associatecf with tlie nocle.
In perforiniiig its work. the mini-rnxx algorithm explores every nocle in the game tree-
Consider the application of this algorithm to cliess. On average, a chess position bas 32 possible
moves (refer to Section 4-32]. -4 tree of deptli n tvould contain 3 Y leaf nodes. Clearly. the
mini-rnax algorithm is not practical For a chess tree when the clepth esceeds 5.
iF node.depth = O t hen
return E \ r . - \ ~ ~ . - ~ ~ ~ ' r ~ ~ x h ~ I x ~ ( n o d e )
score - -,x for i - 1 to r~oSe.bra7~ch.length~
nezu-node - T R . . w E R S E ( ~ O ~ ~ . node. branch[il)
value - - N ~ ~ = \ ~ I . - \ x ( n e w _ n o d e )
if value > score then
score - value
ret urn score
2.4 The Nega-Max Formulation
T h e min i -ma algorithm can be simplifieci by eliminating the distinction betwecn rnx~irnizing
and minirnizing nocles. By simply negating the rcsrilt returnecl froni the recursive cal1 in Fig-
rire 3-2. eczcll nocle can be treatecl as a rnasirnizing node. Hotvet-er. another ~rioclificatiori is
rieccssarj- to acliiex-e the sanie resrilt cas t lie mini-tru.. algori tlini. -At il leaf iiode. t lie Ev--\LL--
.-\TE fiinction hzis to retiirn a score from the viewpoint of the player on niove. For esample.
consider the c~ase of a leaf node that has a score of -6 in the origirial mini-rnas scherne. The
score indicutes tliat the position favors player 2. In the new scheme. if player 1 is on move.
a score of -6 svould be retiirried. However. if p lqe r 2 is on move, thcn a score of G tvorilcl
bc returnecl. Tlie evaluation Function that irnplernents this Frinctionality wil1 bc referred to as
EV~\L~~.- \TE-\IEC~~I: \S- Figure 2-:3 illustrates the nega-max algoritlim [13] that is obtained dien
one implernerits the changes clescri bed.
2.5 The Alpha-Beta Algorithm
A closer esamination of the mini-mas rilgoritlirn reveals possible enhancemencs to clle basic
techiclue. Table 2.1 illustrates the progess of the algorithm on the tree of Figure 2.1. It is
assu~necl thnt the espûmion at each node proceeds in a left to riglit order. We start at the Root
node. which initially hcas a score of -m. and the exploration begïns svit1.r branch a. Mode -4
s t ~ u t s with a score of +cc since it is a minirnizing node. The process OF recursive cdls contiriues
until leaf node 1 is espandeci. Here the recursion stops and a value of 7 is returnecl. ,4t node
C. the oIcl value. -m. is ot-ermritten with the T that is returned by node I. Eventnally. nocle -4
obtains a score of 4 at step 18. Seurcli at nock -4 proceeds R-ith tlie traversal of brandi e. Node
E. beirig a nii~t;irriizing node. l ias an initial score of -cc. On exploring brancli o. riode E obtains
a score of 5- This is n-here the entianccment can be macle. Since riocle E is a masimizing node.
the score cun onlj- go higlier tlian 5. Ho~vetvr, it is dso known that at nocle -4, a mininiizing
node, the score is 4- Xocle -4 \ d l reject any =due that is greater ttiaii or eqtial to 4. Tliiis. the
irnesplorecl branches rootecl a t riode E can be eliminatecl from the searcli sincc tliey wi11 liuve
no eKect on the score at nocle -4. Such an elirnination is termecl a cut-off in ganie tree parlance-
There niust be conirnrinication betweeri the adjacent let-els in the trec in orcler to cletcrmine
wheri searcli at a nocle becomes no longer necessar5 The enIrancecl \-ersion of the rnini-mas
algorithm. whicti is referred to as the weak alpha-beta aigorithni [El. is illiistratecl in Figure 2.4.
The best score obtained a t any riocle is pcassecl clou-ri CO tlie siiccessor as a boiinci so that cut-offs
can be macle.
The weak alplia-beta nlgoritlirn still misses sonie cut-of%. Considcr the applicatioii of the
algorithm to the trec of arbitrary clepth as illiistrnted in Figure 2 5 . Xfter node -4 1123s been
esplorecl. the Root d l have a score of 4. Nocle B n-il1 receive 4 LIS a t~ouncl from the Root.
Since no searcti has bccn done at nocle B. nocle C receives an infinite hound, Similarly node
E also receives an infinite bound from nocle C. Howet-er. the botincl tliat KU applieci at riodc
B still applies to nocle E since the Root will reject any score that is Icss than or eclual to 4.
In this pnrticular case. aclclitional cut-offs can be macle cli~ring the searcli of thc subcree rooced
a t E if the liiglicst score obtainecl at a mxïimizirig nocle is carriecl don-nwarcls as a bounct. -4
similar argument can be macle for the smdlest score at niinirnizing noclcs. Thus, the algorithm
can be erilinnced fiirther by maintaining two boulicls:
r alpha (lo!wer bovnd) : Keeps track of tlie liighest score obtained at a mxsimizirig riodc
liiglier up in the tree ancl is usecl to perform cut-offs a t rninimizing nocles.
r betct t t t pper bound): Iieeps track of the Iowest score obtuinecl a t a minimizing nocle Liiglier
iip in the tree and is usecl to perform cut-offs at mairnizing nocles.
The resultirig technique is referred to as the alpha-beta [12] algoritliin ancl is summarizecl
in Figure 2.6. To determine the gane tree valriet the algoritlim is irivokecl with the cal1
ALPC[ABETA(ROU~. -m. foo). The pair of n~irnbers~ (-x. toc) . defines the seurch windozu.
An illustration of the cut-offs achieved by the alpha-beta algorithm is shown in Figure 2.7.
At node E. branches p and q are elirninatecl. Wien the searcii arrives at riocle B. there is a.
lower boirnd of 4. On esploring F. node B obtzuns a score of S. A similar exploration of G
yields -1. This esceeds the lower bound and the searcli is terminated at node B: node H is
then cut-OR.
Step
Root
-4
C
1
C
J
C
K
C
-4
D L
D
114
D
1V
D -4
E
O
E
P
E
Q E
-4
Score -
-4ction.
Esplore a
Explore c
Esplore i
Return T
Esplore j
Retrrrn '1
Explore X:
Return -9
Return 7
Explore cl
Esplore 1
Rcturn 4
Esplore
Return -2
Esplore n
Retirrii -:3
Retiirn 4
Explore e
Esplore O
Retiirn .5
Explore p
Return -1
Esplore q
Return 3
Retiirn .5
Retirrn 4
Table 2.1: Partial analysis of how the mini-mas algorithm explores the tree of Figure 2.1.
1: if node .depth = O t hen
'3: return E v = \ ~ ~ - - \ ~ ~ ( n o d e )
:3: if nocle.tgpe = maz then
4: score - -x 5: else
6 : score - f x - 1 : for i - 1 to node-b.r.anch.lengt/~
8: r~e~u-node - T R = \ v E R S E ( ~ O ~ ~ . r~ode.brmch[i])
9: valrie - \ v ~ - b \ ~ A k ~ ~ ~ ~ ~ \ B ~ ~ . . i ( r t e ~ u - n o d e . score)
10: if node. t ype = maz then
11: if ualue > bound then
12: return Sound
13: if unlue > score then
14: score - value
15: else
IG: if value 5 bound then
17: return bovnd
18: if ualue < score then
19: score - valzie
20: return score
Figure 2-4: The weak alpha-beta algorit hm.
Figure 2.5: -4 ganie tree ~vhere the ce& alpha-beta algorithm misses cut-offs.
The alpha-beta algoritlirn can be siniplifiecl usirig the nega-max formillatiori i r i a manner
similar to the simplification of the mini-max algoritlirn- The reforni~ilated algorithm is prescntecl
in Figure 2.8. Eacii node is treatccl as a nicxximizing node. thus hetu is used as the borincl tliat
determines when ciit-offs are possible. Furttiermore. \\-lien making a recursit-e call. the boiincls
are reversed (the lower bouncl beconies the upper bouncl uncl vice vcrsa) anci negatecl. This
allows the sub-nocle to be treatecl as a niaxirnizing node. Note that cc-aluations are cornputeci
bj- EVALCATENEGA h IAX as recliiirccl b>- the riega-niax sclieme.
The efiïciency of the alpha-heta algorithm is clepencknt on the orclcr in wliich the brariclics
are scarclied- If brariclies tliat leact to high scores (or low scores in the case of a nrinirriizing
nocle) are espanded first. then tighter bouncls will be obtainecl for the rest of the searcii. This
will result in w higlier nuniber of ciit-offs. For some problenis. the cpality of a bru~icli is not
known until tlie leaves are reached, thus it is clifficult to control the algorithni so that it searches
.-good'- branches first. However. in niari)- cases. an educatecl guess can be uiade regardirig the
cluwlity of a €)ranch €rom some preliniinary information. Iihere such information is at-ailable.
the efficiency of the search is greatly entiancecl if the lntnchcs are esploreci in orckr of clecrcasing
clu d i t y-
2.6 A Perfectly Ordered Game Tree and Node Classification
Consicler the gume tree in Figure 2.9. It is a revised version of the tree in Figure 2.1. The
brariclies have been arrangecl so that the best branch at each nocle appears i r i tlie leftmost
positiorr. Srich a tree is perfectly orderecl for an alpha-beta search that espands branches in a
left to riglit order. The nocles of a perfectly orderecl game tree cûn be clcassifiecl into tliree tmes
ALP~IXBETA( node. alpt~u. beta)
1: if node-depth = O then
2: return E\-.-ILC.-\TE(~O~~)
3 : if node-tgpe = rnax then
4: score - nlphct
5: else
G : score - betu - r : for i - 1 to notle.brar~ch.length
s: 9:
10:
Il:
12:
13:
12:
15:
16:
Li:
1s: 19:
30 :
nedw-nocle - T R A V E R S E ( ~ O ~ ~ . node.branci~[i])
if node-type = rnax then
udue - k ~ t ~ t [ = \ B ~ ~ ~ ( n e * w _ n o d e . score. beta)
if d u e > beta t hen
return betn
if ualue > score then
score - value
else
uahe - , 4 ~ ~ t [ ~ B ~ ~ - - \ ( n e ~ w - ~ ~ o d e . alpha. score)
if value <_ alpha then
ret urn alpl~a
if value < score then
score - unlue
21: return score
Figure 3.6: The alplia-beta algorithm.
Roo t
CL ah
Figure 2.7: Cut-of% macle bl- the alpha-beta algorithm.
A L P I - I X B node. de. alpha. beta)
1: if node-depth = O then
2: return Ev .ALL '= \TENEG. - \~ Ixs (~~~~)
3 : for i - 1 to node.branch.lenqth
4: neW-lzode - TR. - \ vERSE(~O~~ . node .brnnch[i]) - 3: uahe - - , ~ ~ ~ ~ ~ B ~ ~ ~ ( n e z u - n o c l e . - beta. -alpha)
6 : if vulue 3 beta then - (: return beta
8: if ualue > alpha then
9: alpha - ,value
10: return alpha
Figure 3.8: The alpha-beta algorithm reforrniilatecl using the nega-mas sclieme.
Figure 2.10: Cut-offs crchievecl by the alpha-beta algorithm on the tree of Figure 2.9.
[12. 171:
ir Tj-pe I or Principal Variation (PV) Nocles
+ Type 2 or CUT Nodes
0 Type 3 or ALL Nocles
The type of each rlocle is iriclicatecl in Figure '2.10. The figure czlso illustrates the cut-offs made
by an alpha-beta a lgo r i t h on the perfectly ordered tree-
2.6.1 Type 1 or PV Nodes
In a perfectly ordered tree: the first sequence of moves searched by the alpha-lxta algorithm is
also the principal variation. This is the cca.se for ,my perfectly orderecl tree. Eacli nocle of the
principal variation is described as being type 1 or PV.
Recall that a fidl alpha-beta search is initiated with ttie cd1 ALPEIABETX(R~O t , -cc. +cc) - Eacti PV nocle receives infinite lower c u i t l iipper borincls. This stems from the fact that PV
nodes constitute the start of the search mcl scores have yet to be establishecl- Sirice the bouncls
are infinitc. u ciit-off never occrrrs at a PV nocle ancl cil1 branches are searchecl. Howm-er. the
searcfi orcler at a PV nock is important- For esample. consicler a PV nocle of the rnxxiniizing
tj-pe. Ini~iaI1~- the lower bouncl is -cc ancl the iipper borincl is +m. Tlic score returnecl by the
first brancli will be iisecl as the loi\-er bouncl for tlie nest brancli to hc espancled. Clearlj-. if
the score retiirnecl by the first brzuich is the highesc possible at tliat nocle. the boiincl d l not
change for the tluration of the search at that nocle. Frirthermore. tlie search benefits frorn the
high lotver bouncl that vas establishecl ri& at the stnrt.
The first succeusor to a PV node is also a PV nocle \\-hile the otlicr successors are C'UT
nocles.
2.6.2 Type 2 or CUT Nodes
CUT nocIes are successors to P V nocles ancl ,U,L riodes. Since a CUT node is not tlie first
successor at a PV node. it will have a boiiricl as establishecl bi- the PL7 nocfe's first hrancl-i.
Once again. considcr a PV nock of the mi~simizing tjpe. The second brarich t o be espancled at
the PV nocle will lead to a minimizing CUT nocle, The score retririictl bj- the PV node-s first
braiicli serves as the Ioi-er borincl at tlie CUT node. Note tliat the CUT nocle docs not have an
uppcr borincl since that riras nevcr dctcrminecl at the rnasiniizing PV nocle - a PV riocle only
cIetermiries one of the two boiinds. Since the tree being searcticcl is perfectly orderecl. tlic first
braticli searchccl ctt a CUT riocle imriiediately leacls to a cut-off. .Jiist as with PV nocles. searcti
orcler is important at CUT nodcs.
The first sriccessor to a CUT nocle is an ALL nocle wlicre,~ the rest of the sriccessors are
cut-off-
2.6.3 Type 3 or ALL Nodes
An ALL nocle is a successor to a CUT node. Being the first branch at a C U S node. tlie ALL
node obtains the same bound information as its parent. Consider the esample of a minirriizing
CUT nocle. The CUT node will have a valid loiver bourid, liowever. tlie iipper bouncl will be
infinite. The Erst branch to be espandecl at the CUT nock leacls to a niaximizing ALL node.
Since the iipper bound is infinite at the ALL node, no cut-offs c m be niade ancl a11 branches
are searcl~ed. Due to the perfect ordering of the tree, the scores retiirned b_t- the ALL node's
successors are not high enorrgh to incre,ase the loier bound - the scores returned are worse
than that establishecl the PV node higlier iip in the tree. In hct. the search orcfer at a n
ALL nocle is irrelevant.
The successors at an -4LL node are al1 LUT nodes.
2.7 Aspiration Search
Wien the alplia-beta algorithm is invoked n-itli the c d A \ ~ ~ t ï ~ \ ~ ~ ~ - - \ ( R o o t . alpha. beta). it re-
turns a mliie becwecn alpha ancl beta - note that alpha aricl beta are valid return values LIS
well. Norniallj-. to determine the game tree t ~ ~ l u c , one woiilcI make a cal1 to the algoritlini witli
alpha ancl beta set to -x and +x respectively. Consicler the case where tlic final game tree
value is known wit 11 sonie certainty. Greater efficicncq- can be achiewcl by ernplo>%ig aspiration
search [lG. 1.51. Figure 2.11 illustrates tlie process. Tlie estimatecl gatne trce value iç Ii. An
error factor e is used to cletermine the tn-O initial botinds alpha ancl heta. The lower bound is
set at orle error factor below V and the tipper bouncl is set a t one error Factor abob-e I./. IVlien
the alpha-beta algorithm is called witti thesc bouncls. the return value is one of three tqpes:
A retiirn value betn-eeri alpha riricl beta: In this case tlie game tree valiie lias been cleter-
mineci and furcher search is not necessary. The trec searcli was Liiglily efficient due to the
narrow searcIr winclow-
A retiirn value ecliial to alpha: The search licu fuilecl-low. The real game tree value is not
knowi: a11 that is known is tliat the tree value is less thuri or eq~ial to alpha.
A retwn vciliie ecliial to betu: T h e searcli lias fuilecl-high- Tlic gamc trcc ~idt ie is grcuter
than or ecliial to beta.
In a fail-low or fail-high situation, n new searcli miist be perforrnecr witli nen- boiiricls in orcler
to determine the real tree value. During a new search, many of the nodes tliar: were visited by
the initial search may be revisited. Clearly. the new searcli rcchces efficiency: liowever. note
tliac this sitiiation only arises d i e n the original searcli fails - a sitiiation tliat does riot arise
too ofteri since tliere is an estimate for the final garne tree valire.
2.8 The Negascout Algorithm
As inentioned in Section 2.5. when there is some preliminary 'qriality" information about the
branches at a riocle. the alpha-beta algorithm benefits from the expansion oE the branches in
order of decreasirig c lua l i~ . If this preliminary information cc- be used to predict which of
the branclies is the best at a node with reasonable uccuracy. a technique similar to aspiration
searcli can be employed recursively to obtaîn a highly efficient tree searcher.
1: alpha - V - e
2: beta - V + e
3: score = - ~ L P HXB ET--\( Root. alpha. beta)
4: i f score = beta then
5: score = .ALPI-I.-\BET.~\(Roo~, beta. +,x)
6: elsif score = alpha then - r : SCO~~=-ALPHABET.-\(Root.-(x-alpha)
Figure 2-11: Aspiration searcri-
Ttie negascoot algoritlirn [XI] ' is illiistratecl in Figure 2.12. Onl- tlie first brancli at each
riode is searchecl n-itli tlie Full winclow. The rest of tlie branches are searctieci with a nul/- winclo,w.
,A null-winclow clcscribes the case LI-here alpha and betu are separatecl by 1 rinit. In tliis case
no real searcli is perforinecl - the nuIl-winclow search amounts to a test on the subtree to
determine wliether its value is Iess than or eclual to alpha or greater than or equal to beta- In
t lie negascoirt algorit hm. after searcliirig t lie first brarich wit 11 t lie f ~ i I 1 winclow. the rest of t lie
braiches arc sirnpll- testcci to see wtiether they lmw a value tliat esceecls the best score so Far
at that riocte. IF any of the braich tests fails-hi&. a rien* scarch is pcriornied n-itli an espanclccl
winclow to establish the truc score of chat particular brancli.
2.9 Iterative Deepening
Rather triari tackle tlic entire tree at once. depencling on the application. it rnay be aclwintageous
to cletcrmine the tree value in s e p s First. the gaule tree value and principal variation For a
tree of depth one is deterinined. Then. a sirnilar process is carriecl out on ct tree of deptli
two. This iteratiuely cleepening process {IG. 151 continues uritil the Cree of the requirecl ciept ii
lias been searcliecl. Tfiere a re two aclvantages to this sclienie. In a situation n-liere tliere is a
time restriction on the searck the search of the entire tree in a single p,ws may be too timc
consiiming. However. if the searclx is carriecl out iri steps, even if the searcli neecls to be stoppeci
at sorne depth, the previous iteratiori provides a reasoriable solution albeit a c a lower seardi
depth. The seconcl advantage is that each iteration can collect useful information about the
tree for tlie next iteration. Consider sorne esamples of t his iteration-to-iteration information - -- -
'Ttie algorithm presented here is a simplifieci version of the one presenteci in [20]. in particular. this version does not make use of the fail-soft extension and performç new searcties even when the height of the subtree is less than two.
NEGASCOL-T( node. alpha. beta)
1: if nacie.clepth = O then
2: return E~-,\L~:.\TENEG.-\A~TA~ node)
3 : ne,w_node -T~,\\'~~~~(node.node.branc/~[1])
4: val,ne - - N ~ ~ - - ~ s ~ o ~ * ~ ( n e ~ u - n o c l e . - beta. -alpha)
.5: if .ualue 2 beta then
6: return heta
7: if - u a l ~ ~ e > alpha then
8: alpha - value
9: for i - 2 to rtorle. 67-unch-length
10: nezu-node - T R A V E R S E ( ~ O & . node. brunch[i])
Il: ,uc~hie - - ~ ~ ~ . - l ~ ~ ~ ~ ' ~ ( ~ n e ~ w - n o c ~ e . -alpha - 1. -cdpha)
13: if ualtte > alpha and ,value < betn then
13: value - - N E G . - \ S C O L ~ T ( ~ ~ ~ W _ ~ O ~ ~ - - beta. -alpha - 1 )
14: if .uttlue 2 beta then
1.5: return betn
16: if value > alpha then
17: alpha - value
18: return alpha
Figxre 2.12: The negascout algorithm.
1: node - Root
2: I/' - initial-estimate
3 : for d - 1 t o depth
4: node-depth - d - 3: alpha - V - e
GI beta - V + e - I : score = - 4 ~ ~ ~ - i ~ \ B ~ ~ . - \ ( n o d e . alpha. beta)
S: if score = beta then
9: score = - ~ L P HXB FI AB no de. heta. +cc)
10: elsif score = alpha then
11: score = -4~~ t i - - \B~~=\ (node . -S. atpha)
12: V - sc0.r-e
Figure 2.13: Ari iteratively cleepening alpha-beta search t h uses die game tree \ - d u e froni the
prevïoiis iteration in an aspiration search of the current iteration.
a The priricipal variation froni the previous iteration is risually a goocl inclics~tor of n-hat
the principal variation froni the current iteration will look like- Search efficEcncy usiially
iniproves if the branches from the Iclst principal variation are espanclecl first .
a The gtime tree vdue from the Iast iteration is LI, goocl estimate of what t h e ganie tree
value will be at the end of the current iteration- Therefore. the game t r e e value from
the previous iteration cari be used to guicle an aspiration searcli of the tree to the clepth
reqiiired by the currerit iteration.
Figure 2.13 corribines iterative cleepening with aspiration searcli. Aspiratiorn searcli uses
a small window centerecl arouncI the game tree value h m the prececling iter;z~tiori for the
search during the current iteration.
The previoiis iteration rnay have retained quality information about certain key branches
in the tree. T h e nature of t h information is iisually application clcpcncicnt- This infor-
mation is then usecl by the current iteration to determine the orclering of the- branches at
a nocle so tlzat good branches will be expandecl first.
Chapter 3
Parallel Alpha-Beta Search
3.1 Introduction
Pardlelization of the alplia-beta algorithm and its variants h,îs provcri to be clifficult. The
overlieacls in a parallel impkmentation of the aigorithni cari be classified into three categories
pl]: r Conimiinicat ion O\-erheact
O Synclironization overlieacl
Searcfi overliead
First. consicler the comrnunicatiori overlieacl- Commii~iication bctween proccssors is normal
in a n - parallel algoritlini hoivever. tliere is orle type of cornrnunication that is unique to paraIIcl
alpha-beta. Tlie seqiientid alplia-beta algorithm iipdates its two boiiricls. alpha ancl beta. as
the search of a gcune tree progresses. \%%en senrcliirig in parallel. if one processor finds an
improvement to alpha or heta. it informs the other processors working belon- tliat node so that
tliey can make Lise of the tighter borincl that ivas just discoverecl.
Synchronization overhearl resiilts when n processor sits idle while waiting for some everit to
occur. For example. if four processors. Pl. P2. P3 and P4. are working together at a nodc.
processors P2. P3 and P4 maj- be waiting for the resrilt of a searcli being condiicted by processor
P l on some brcuicli at that node.
Search overliead is a consequence of the parczllel alpha-beta algorithm esamining nodes that
nroiild have been ûvoicied by the secluential version. When parallel searcli is initiated at a node.
the best score might not have been discovered as yet. As a result. parallel searcli is conciuctecl
116th a wider winclom than in the sequentid case. Fiirthermore: aiter parallel secarch 1icw been
initiated. one processor may discover that sench is no longer necessary ût the nocle due to a
CU t-off condition - the other processors have essential1y performed useless work.
The three overheads are not iriclependent of each other: they are reIatecl in a compIcs
mtmner. Let iis consicler a few examples of tliis cornples relationsliip. To redrrce searcli overhead.
the processors ma>- coritinirously update each otiier with the latest seardi iriforrnation thereby
increasing comnirrnication overheacl. In another s t ra tes- to recluce searcfi 0%-erheacl. we ma>-
recluire that a certain niimber of branches be esplored in c i seqirential fashion before attenipting
parallel search at a nocle. IWieri the parallel searcti is actrial1~- startecl. it is IiighIy likely
that the score at the nocle will have stabilizecl. Communication overfieacl is also red~icecl as
messages carrying bound information 6 1 1 be less frequent. However, synclironization overlicucl
nrill incretise since thcre will be sewral processors waiting idly d l i l e a single processor completes
the recpirecl nuinber of branches.
In this cliapter. important work in thc area of dpha-beta parallelization is prescnted. M%ile
the litcrcitiirc describes several methocis [l]. only tliree are describecl here. These tliree methocls
have been cliosen carefully froni the several that arc available. The first niethocl. principal
variation splitting (PVSplit) [16]. is the rcsult of some of the eirliest attempts at paralielizing
aiphci-beta. Altliough it is a relatively olcl iclea. PVSplit has been the subject of much rcsearcli
ancl it lias bcen the source of inspiration for several newer metliocls. The other tn-O mettiods.
J-oirng hrotllers wait concept (YBIVC) [G] and di-rianiic tree splitting (DSS) [91. arc more rccent
and spectaciilar speed-ups have been reported. These two rnethocis ernbod>- two clifferent design
goaIs: YBWC nwi clwelopecl in a clistribi~tecl envirorinient wliere communication costs are liigli.
whereas DTS wa.s developed in an environment where communicution is quite ctieap.
3.2 Super-Linear Speed-Up?
Xltlioiigli i t ruel>- ever happens in practicc. pcirallel alpha-beta may yield super-liriear speed-
iips. Consicler the tree of Figure 3.1. It is tusiimecl that this trce is part of a tnucli bigger trec.
When the searcti arrives at the Root node, alpha ticas a value of -5 ancl beta has a valrie of
0.l It is assiimecl that eacli major operation in alplia-beta can be completed in one time unit.
Table 3.1 illustrates the progress of a single processor esecuting the alpha-beta algorithin on
the tree. It takes 25 time units to cornplete the search. Now. consider the situation where two
processors. P l ancl P2. ~zre workirig togettier a t the Root node. The first processor handles the
subtree rootecl at node A while the secoricl hanclles the subtree rootecl a t node B. The progess
of processors Pl and P2 is illustratecl in Tables 3.2 and 3.3 respectively. Since rnove orclering is
bad withiiz the subtree rooted at A. 2x11 nodes will have to be esplored. However. there is perfect
ordering in the subtree rooted at B and cut-offs are plentiftri. The seconcl processor completes
its search in a short period of tinie and it discovers that a eut-off condition esists at the Root.
LThis discussion uses mini-ma.. conventions as opposed to nega-mu conventions.
Figrire :3.1: Parailel search on this tree ~ielcls super-linear speecl-irps.
Processor 1 is tlien stopped ancl t h e search of the Raot node is coniplete. Witti tn-O processors.
the search takes a mere II time ~ n i t s . Clearl. super-line,~ speed-up is a possibilit_v. but tlie
ratlier contriveci nature of the t r e e in Figure 3.1 and the bou~icls used shoir1cl be inclicative of
the fact that super-Iinear speed-up rarely occurs in practice.
Principal Variat iom Splitting
In PVSplit [lG]. the ride is that t h e first branch at a PV nocle must be searclied before paralle1
searcti of clie rernaining branches ma- begin. XII processors tra-el down the first branch at
each PV nocle iintil they reach t h e P V node that is one level above ttic leaf nocles- Here orle
processor searches the first brancli while the otlier processors wait* Oncc tlic first brunch lias
been esaminecl. al1 processors join the senrch effort. Each processor takes away a branch at
a tiine and cletermines i t s \-due. Ef a processor cliscovers an iniprovenient to tlie score ut the
nocle. it informs the otlier processcrrs of the iipclatecl vaiue. When tliere aren't ariy unassignecl
branches. a processor chat rims o u c of work remains iclle until the otlier processors finish. Oncc
a11 branches liâve been esamined. tzlie search effort rnoves the c u r e n t node's parent. Sirice che
vahe of the parent node's first brimch was jirst computed. pardlel search c m bc startecl at tlie
parent as well. This process contiriues upvm.rcIs in the tree until al1 the branclies nt tlie Root
rio cIe have been esamined.
F i s ~ r e 13.2 illustrates the progress of two processors. P l and P3. as they lise PVSplit on a
small tree. Both processors travel d a v n the leftmost path until they reach node D. At this node.
P2 remains idle whiIe P l esplores brcmch g. Once the exploration of g is complete. pczrallel
search is started; Pl explores h wliile P2 explores i. I h e n D lias been completely evduated.
T h e
O
I .) - 3
-4
.5
G - I
8
9
10
11
12
13
1-4
1.5
16
I I
1s 19
20
21
22
'2 3
24
Roo t
-4
C
G
C
H
C
&4
D
1
D
J
D -4
Roo t
B
E
f-c E
B
F
hl.
F
B Roo t
Beta
O
O
O
O
O
O
O
O
-1
-1
-1
-1
-1
O
O
O
O
O
O
O
O
O
O
O
O
Score
Esplorc CL
Explore c
Esplore 9
Return -2
Esplore h
Retrirn -1
Retiirn -1
Explore cl
Esplore i
Reti~rn -4
Explore j
Retrirn -3
Retrirri -3
Retiirn -3
Explore 6
Esplore e
Explore k
Reti~rn '2
Lut-off
Esplorc f
Explore rn
Retiirn 4
Ciit-off
Return O
Ciit-off
Table 3.1: Prog~ess of the aipha-beta algorithm on the tree of F i s i r e 3.1.
Beta
O
O
O
O
O
O
O
O
-1
-1
-1
Score
Explore n
Explore c
Explore g
Return -2
Explore h,
Retr~rn -1
Return -1
Esplore ci
Explore i
Return -4
Esplore j
Table 3-2: Progress of P l on the tree of Figure 3.1.
T i m e
O
1
2
3
4
5
6 - I
8
9
10
Roo t
B
E
1 -
E
B
F
1 w
F
B
no0 t
Beta Action
Esplore 6
Explore e
Explore k
Return 'Z
Lut-off
Esplore f
Esplore rn
Return 4
Cut-off
Returri O
Cut-off
Table 3.3: Progress of P2 on the tree of Figure 3.1.
Figure 3.2: Two processors iising PVSplit to divicle iip tlie work in a small t ree-
the secarch moves to its parent. -4. Since the elclest brotlicr, d h,îs been esamincc1 at -4. pnrallel
search can be ernployecl once again. Processor P l evaluates e while P2 evaluates J Once -4 lias
been evaluated. parallel searcli begins at the Root nocle witii P l evaluating h ancl P2 enliiating
C.
Let us esarnine some of the reasoning behincl PVSplit. First. at a. PV noclc al1 branches have
to be seiwciiec1. tliiis paralIel searcli is a good idea at tliis t_pe of nocle. Second. bi- recpiririg that
the first brandi be esaminecl before parallel search is startect nt a noclc. parallcl scarcii starts
only mhen a bouncl lias been cletermined. If the branches have becn orcicrcd uccorcliiig to sorne
preliminary cpaiity information. tlien tlie score rctilrnccl by thc first brancli m l - be the bcst
possible at that nodc. In fact. the original work [lG] clescribecl PVSplit as a rnetliocl for scarchiiig
strongly-orderecl trees ( d e r to Section 4-3) mhere the first branch at an>- nocte is t h best 70
percent of tlie tirne. If thc best possible score is obtaincd wlicri thc first bru~icli is csciniinecl.
parcille1 seardi will esairiiiie precisely the same noclcs as tlie seqricntial versiori- Ttiereforc. thcrc
is no searcli overlieacl if kit each PV nocle the first branch is also tiic best. Tliircl. searcli at a
PV node examinés more tiocles ttian the search of sin)- other nocle of ec~riivalent height because
ii. PV nocte has no bound inforniation - alplia and beta are at negative infinity aiicl positive
irifinity respectively. Thils. a PV nocIe is a reczsonable choice as a site for paraIlcl searcli.
The PVSplit method is not wittiout its Fa~ilts. Ctlieri the first brancli is riot the bcst. search
overhead increases c i s parallel search is conducted witti a bouncl tliat is not as tight as in a
secluential search. Depencling on tlie branching factor. the methocl may not be able to use u large
number of proccssors effectively. For example. in cliess. where the average branching factor is 32.
the met hoc1 cloes not have a rneclianism for hmidling more tlian 31 processors. Spchronization
overhcacl is a significant problem in PVSplit. Consider wliat liappens <as parczllel search at a
Table 3.4: Performmce of PVSplit on chess trees.
Processors
Speed-Up
Table 3.5: Performance of EPVS on chess trees-
1
1.0
Processors
Speed- Up
node nears its end, hsIost of the processors in the searcli effort wïll be idle waiting for a few
processors tliat are searcliing "difficult" branches. -4 clifficiilt brandi is one t h recliiires a
kzrger secarch effort because more nodes are esaminecl in the subtree generated by tliat brandi
c~rnp~xeci to the other branches a t the node-
Esperiments wïth PVSplit have sho~vn that speecl-rip is limitecl to a large estent by s y -
chronization overhead [17, 31, 91. Searching chess trees on a Crêiy C90. the techniclrie procluces
the speed-ups in Table 3.4 [9]. The speed-iip seems to be limitecl to am upper boririd of 5. This
iias led to some interest.i~ig work th& tries to recluce the synchronization 0%-erlieacl in PVSplit,
In enhanced principal variation splitting (EPVS) [9], wlien a processor becomes idle. al1 pro-
cessors move to the subtree beirig searclied bj- one of the busy processors and a site for parallel
search is createcl two levels below the original site. Esperiments n-itli this metliod proclucecl
the speed-ups in Table 3.5 [9]. Although chis metliocl is a little better tliari plain PVSplit. it is
still not very efficient wlien a large number of processors are involt-ed in the searcli effort. Iri
Dpcimic PVSplit (DPVS) pl]. each processor ruris a version of PVSplit. However. the differ-
cnce is that a controller process dynamicallÿ assigns iclle processors to lielp the busy processors
in the syscern. Searching chess trees. this tcclmiqire obtairiecl a speed-irp of 7.G-l on a network
of 19 Sun 3/7Ss,
3.4 Young Brothers Wait Concept (YBWC)
2 [ 4 1.8 1 3.0
There are two clifferent versions of YBWC. The earliest one is referred to c i s the weak YBTVC [G].
A more recent version that nioclifies t h e technique slightly is referrecl to <as the strong YBWC
[5] . Whenever R distinction neecls to b e made between t lie two versions. the clualifications weak
and strong will be used. Honever, if the discussion applies to botli techniclues. then the rnetliod
wiLl be referred to as simply YBWC.
At any node, the first branch to b e espanded is referred to the eldest broth,er ~vllile the
1 1 2
S
4.1
4
3.4 1.0
16
4.6
1.9
Y 5.4
16 6.0
other branches are referrecl to as the younger brothers. In weak YBLk-C'. the rule is as FoHows:
the elclest brotlier has to be examinecl before parallei-search of the o u n g e r brothers is possible.
Alttiorigh tliis is siniilar to PVSpiit. in YBI,L7C pardlel searcli is possible a t a.ny nocle not just
at P V noctes.
Before delving into the method's cletails, the concept of node ownersliip is introclricecl, A
processor that owns a riocle is responsible for its evaliiation- I t is d s o responsible for retirrning
the nocle's evaliiation to its parent- Note tliat this ma>- involve comrruinicatiori if the owner
of the evaluutecl node is clifferent from ~ h a t of its parent. Us~ially. a node ancl its sirccessors
have the same onrner, However. multiple processors cnn collaborate on a node if sorne of the
successors have different omners. In IBWL. once a processor is given ownersliip of a node. tlie
ownersliip of that node is not tramferable to anotlier processor.
At the start. one processor is given on-nership of the Root node wllile tlie otlier processors
remain in an idle stnte. A processor. P 1. tliac is iclle selects anotlier processor. P2. a t r a ~ i d o m
ancl transmits a message recluesting work. Processor P2 has work at-ailable if ttiere is a t leiut
one node in the subtree i t is esamining that satisfies the weak YBWC criterion. Tliat is. P2 h a
work available if it o m s a node a t tvliich the elclest brother has been evaluatecl. The node ttrat
satisfies the criterion becomes the split-point: if tliere are many nocles that sutisfy the criteriori.
then the node that is the higliest in the tree is selected as the split-point. X split-point is a
node that lias been chosen as a site for paralle1 search.
If P2 h a work available tlien a master-slave relationship is estât~lisliecl between P2 and Pl.
Note that P l may be one of m a v slaves to P2. The master ancl its slaves sliare the search
effort czt the split-point- Eacl-i processor takev rIivi1.y a brancli a t a tirnc iintil the setirch a t the
split-point is complete. A s in PVSplit. if one processor fincl5 an inlproveirient to the score a t
the split-point then the new score is transrnittecl to tlie other processors involved. -4 processor
rnay also discover a eut-off couditioii nt the split-point. In ttiis case. the search is coniplete ancl
the slaves return to their iclle state. A slave may also return to its idle state if there isn't a n y
work left a t the split-point. If tlie m u t e r returns h m the searcti of some brancti to find no
work a t the split-point. it should not remairi iclle whIe nraiting for the brisy processors to finish
because this woiilcl increase synchronization overhead. InsteacI, the master acts as a slave to
one of the busy processors. This is referrecl to as the helpful mnster concept.
\\lien an iclle processor P l transmits a message requesting work to a processor P3, the
latter may not have any work available. Processor P2 forwuc~s the request message to anotlier
randomly seIectec1 processor. However: if the message h a already travelecl t liroiugti a certain
nirrnber of processors, P2 throws away the message and informs P l t h no work is axailable.
Processor P 1 t tien begins reqriesting again.
In strong YBWC, the nodes of a g,me tree are classified into three types:
Y-P 1;: The root nocle is of t ~ p e Y-PV- The first successor a t a Y-PV nocle is of type
Y-PV tvliile the rest of the successors are Y-CUT-
Y-CUT: Ttle first successor is a Y-ALL nocle mhile the rest are Y-CUT nocles.
0 Y---ILL: Al1 successors are Y-CUT nocles.
Note tliat the clefinition of a Y-PV nocle is the sanie as tha t of a PV nocle escept tliat it proclirces
Y-PV ancl Y-C'UT nodes as successors- Fiirthcrrnore. a Y-ALL nocle is similar to an ALL riocle
escept tliat it procl~ices Y-CLT nocies as sirccessors. However. a Y-CUT riodc is cltrite clifferent
comparecl to u CUT nocle. Recall tliat a C'UT node is definecl as havirig onli- one srrccessor
2u1d that tlie lone sriccessor is of t j p e XLL. \ \ Ide tliat definition is suitable For a tree tliat is
perfectly orclerecl. when a tree is imperfectlj- orclerecl, a CUT nocIe rriay have more than onc
siiccessor. The nem- node classification specifies that these adclitional successors a t CGT nocles
are of type Y-CUT.
Strong YBIVC uses the weak YBW-C critcrion a t Y-PV and Y-,ALI, nocles. Han-ever. at Y-
CUT nocles. s t rong YBIvC' en forccs a clifferent riile: al1 --prornisirig" branches [rius t be esaminecl
before parallel search is possible. -4 proniising branch is one that is likely to prociucc a ait-off
basecl on sorne preliminary clualitj' iriforrnation. The esact clefinition of a promising brancli is
application clepenclerit, In strong YBUC there is a longer wait a t LUT nodcs before parallel
search is possible. Althougli tliis recluces the poteritial parallelism. the searcli overliencl is greatly
rccluced ancl in practice. strong YB\I;C prodiices better speecl-iips than n-eak kcB\lïC.
On a Parsj-tec SC 320 rnadiinc (basecl on tlie TSOO Transputer). weuk YBTk-C obtainccl a
speecl-up of 1:37 wlien scarcliing clicss trees ~ 5 t h 256 processors [SI. S trorig %BWC obtaineci
a speed-iip oE 142 on tlic sanie swtem. Esperirrients were also concluctecl on a Parsytcc CCel
machine (basecl on the T805 Transputer) witli 1024 processors. TVitli 1024 processors. strong
YBWC producecl a speed-rip of 344.
3.5 Dynamic Tree Splitting (DTS)
DTS [8, 91 lises a peer-to-peer approacli ratlier t h a i a niaster-slave approacli iu in YBWC.
Nocle ownersliip takes on a ciiffererit rneaning in DTS. While many processors m q - collal~orate
on a node, the processor that finishes its searcli kast is responsible for returning the node's
evdiiation to its parent,
At tlie start? one processor is set to searcli the Root nocle wliile the other processors are in
an idle state- An idle processor consults a global list of active spIit-points (SP-LIST) to fincl
work to do. If a split-point witli work is found, the idle processor joins the other processors that
are working a t that split-point and the work a t that tiode is sharecl. Wowever. if nu work can be
foiincl in SP-LIST. the idle processor- broadcasts a RELP message to a11 processors- On receipt
of the HELP message. a processor th-at is bwy copies the state of the sirbtree it is esmiining to
a shared area. T h e idle processor thwn esamines the shared area to find a srritable split-point.
If a split-point can be founcl. the splEt-point is first copied into SP-LIST ancl the idle proccssor
tlien stiares the ~vork a t the split-ponnt with the processor that originally espanded the nocle.
If a sriitable split-point cannot be found in the stiarecl iuea. t tie iclle proccssor rebroaclcasts the
HELP message d t e r a sniall clelay.
Wien a processor returns from ttiec search of some brandi to fincl no work at a split-point- the
processor simply enters the iclle stüte n-here it can try to find ivork at another riocle. However. if
a processor retirrning froni the searclu is the Iast processor at the spIit-point. then the processor
is responsible for returning the nocle-s evaluation to its parent. Furtliermore. this processor
does not cnter the iclle state but conainues working at the parent nocle.
Sirnilar to PVSpIit und Y B T V C . Si one processor disco~rers an it-riprovecl score. the score is
shared with the ot her processors wor-king at the split-point. Insceacl of an improt-ed score. if a
crit-off condition is cliscovered. a singrne processor is left a t the node as its owncr wtiile the other
processors return to their idle states-
Finding a siritable split-point uf ter having broadcast the HELP cornniand is rather cornpli-
cated. The selection procecture is n o t ,as simple as the one founcl in YE3WC. First. tiic type of
each nocie is determinecl. The set of sriles tliat is usccl to determine nocle t>-pe in DTS is cluite
different from YBWC. therefore clifferent riatnes are used to avoid an)- confusion. -A notic is
classifiecl into three types:
D-P V: A nocle that lias the samie alpha and beta valries as the Root.
O D-CUT: A minimizing nocle wi-tli the sanie beta as the Root or a rnaxiniizing nocle with
the same alpha cic; the Root.
D-ALL: Any node that does nort fit the D-PV ancl D-CUT criteria.
The types, D-PV ancl D-CUT, are eq-uivalcnt to the normal types. PV ancl CUT. respectively
Hoivever. the D-ALL node is mucli broader in scope than the ALL type. Xlthough every XLL
nocle is rilso a D-ALL nocle. the D-AILL type abo encompasses those noctes that are seûrched
due to imperfect ordering. After cletertnini~ig the node type, there are two override pllises.
During the Grst overricle phase. a nocle's G)*pe is changed from D-CUT into D-ALL if more than
three nodes have been esamined at t h e node witiiout having achieved a ciit-off. The second
override pliase cleals with the situation where there are several D-ALL nocles at consecutive
levels in the tree. DTS only ailows txwo D-ALL nodes to be consecutive in the tree. Following
the second D-ALL node. nodes <are fozrced into an alternating sequence of D-CUT ancl D-ALL
Processors 1 1 1 2 1 -L 1 S 1 16 (
Table 3.6: Performance of DTS on chess trees.
riocles. There is also a conficlence Factor associated witlr each D-CU,T ancl D-ALL nocle. If man)-
rnot-es (irp to a limit of tliree) lim-e been seardied at a D-C'UT riocle. tlien thé conficlence thrit
it is a D-CUT node is lowerecl. IE several moves have been searcfiecl a t a D-ALL node. then the
conficlencc that it is a D-ALL nocie incre'ases. -4 nocle's suitability as a split-point is baseci o n
four factors:
Speed-Up
The node niust be of type D-PV or D-ALL.
3-0 1.0
The height of the node. Nocles that are Liigher irp in tlie tree (closer to tlie root) represent
more w o r k
IF it is a D-PV node. its first branch m u s hm-e been searclied.
3.7
If it is a D-ALL node. the confidence factor should be relativelu high-
The process of selecting a split-point is quite cornplicatecl but al1 this effort is in an atternpt to
reclirce t lie searcli over heacl.
Searching cliess trees on a Cïi'~? C916/1024 machine. DTS proclircec~ the spcecl-iips in Ta-
ble 3.6 [91. Since DTS was clesignecl with sliared mernory in minci. esperimerits with a large
number OF processors are riot available as shared memory rnriltiprocessors are liard to find i r i
large configurations.
6.6 11.1
Chapter 4
Artificial Game Trees
4.1 Introduction
Although a wide variety of game trees arise in practice. as f a as the alpha-beta algorithni
is concerriecl. a trec's average branching factor ancl tlie clrialit? of its brmcli orclering risiially
governs the lengtli of time sperit searching the tree- Searctring artificially generated trees allows
one to esamine the algorithm's beliavior for a wicle variety of branching factors ancl branch
orcierings. IYtien artificial trees are risecl as a mode1 for the trees encounterd in practice: tliey
slioulcl rnirnic the behavior esliibited by real trees. Tliis chapter introduces the notion of an
eqonent ia l ly or-dereci tree. A method for generating siich trees is ctlso describecl.
4.2 Generating Artificial Trees
-4ny method used in artificial tree generation sliould satisfy three requirements, First. the
method must be able co generate a wide variety of trees. Second. wlien identical input param-
eters are used. iclentical trees shoulct be proctucecl. This is part icularly important bccause t lie
artificial trees are risecl to compare different teciinicpes ancl iclentical trees m~ist he presentect
to eacli tecliliiclue. TtiircI. the order of brandi espansion slioiilcl not affect the tree generatecl.
In secliientid searc11 this recluirernent has no effect. but in parallel search the order of branch
expansion is iisually depencient on the parallel searcli methocl. The methocl clescribed below
satisfies al1 three recluirements.
Three routines risecl for random nurnber generation in the artificid tree generator are esani-
ined first before a treatment of t.he niethod itself. Figure 4.1 shows these three routines. The
RANDOM fiinction is the main ranclom number generator. It is of tlie linear conguential type.
The R . . I N D O ~ I ~ E E D functiori is used to seed the main generator. In reality. this is also another
Linecu: congruential generator. Normally, it would have been sufficient to sirnply copy the seed
1: constant randon-lirnit - 2147483647
R A N D O ~ I S E E D ( ~ ~ . seed)
2: n d - s t a t e - (1GSO7 - seed) mod randorn-1imit
13: rnd-state - (48271 - rnd-s ta te) mod r.andorn-limit
4: return md.s ta te
5: return R X N D O A I ( ~ ~ ) - (upper - la*wer + l ) / rando~m- l imi t + lovrer
Figure 4.1: Rancloni nuinber routines usecl b - the artificial tree gerierator.
into the gerierator's state mriable. Firrtherrnore. if the seeding f~inction had to use a niultiplier.
it coiilcl have iisecl the s a n e miiltiplier LE, t lie main generator, Howev-er. u stcinchrcl seeciing
ftinction is not aclecluate for the artificial tree generator because it uses multiple raridom nrirn-
ber strearns ancl each Stream is initializecl rising ;i. seecl generatecl by anotIier strècirri. If eitlier iio
multiplier or the same multiplier wis usecl. tlieri the same secluence of raricEom numbers woulcl
be generatecl by a streiirn ancl the Stream tliat produced its seecl. The ra tdoni tree inetliocl [A]! - -
which is iisecl for parallel ranclorn number generatiou. provicles the necessary insiglit to sol\-e tlie
problem of similar seyuences - a second linear congrrential generator can be usecl to separate
the two secluences so tliat tliey appear clifferent. Additionally: note that t h e rniiltipliers usecl,
16807 ancl 48271. and the rnocl~diis. 2147483647: are known to have goocl randonmess properties
[ I l ] . Tlie R.-WDORIRXNCE function is an extension of R.-w~ohr to provide ranclom numbers
in a certain range. [loure~.. 7spper]. Tt siniply scules and shifts a randorn n ~ i m b e r generatecl by
RANDON irito the appropriate range before returning the result.
Virtually al1 of the tasks carried out by the ürtificial tree generator can be combinecl into
a sin& routine. This routine will be referred to as BRANCHGENERATE a n d is presented in
Figure 4.2. Every interna1 nocle invokes BRANCHGENERATE to generate i t s siiccessors. The
generatiori of sirccessors at each nocle is controlled by two variables, seed a n d pscore. Both of
these variables are set by a nocle's parent. The generation of successors i s as fol lo~s. First,
a new streum of ranclom numbers is initiczlized using seed (lines 1 - 2). Each successor needs
d u e s for its seed ancl pscore variables. It also needs vdues for its depth (which represents
the height of tlie tree at the node) and t?jpe (mxürnizing or minimizing) vclriablcs. Strictly
spedcïng. depth ancl t y p e are not relatecl to artificial tree generation - they are integral to
uny tree expansion routi~ie. artificial or not. Each successor's seed is generntecl bu a cal1 to
R.-w~oi\r ( h i e 17). Determining a valile for a successor's pscore is more involvecl.. A nocle's
pscore value represents the score that is obtainecl when the results rcturnecl bj- the successors
are mt~ximized or minimizecI <as requirecl at the node. The pscore For the first siiccessor is set
ecliial to the nocle's pscore. The smallest and largest score that can be generatecl is IieId in the
constants. score-min cuicl score-m,ax. respectively If a node is of the mc~simizing type. then
the value of the pscore variable for eacli OF tlie remaining successors is obtained by generating
raiiclom nurnbers i n the range [score-min. pscore J (line 10)- However. if the nock is of the
minimizing type. chen ranclom immbers in the range. [pscore. score-mas]. are irsed insteacl (line
14). Consider wlmt lias been accornplished so Far, The best score a t a nocle is d i ~ i ~ ÿ s in its
Eirst position. If the node being espancled is of the ma.simizing type. the remaining successors
will al1 have scores that are smaller tliczn or q u a 1 to tlie first successor. IF the node is of the
minirnizing type. tlie remaining successors will have scores t h are greater tlmn or equd to
the first siiccessor. This essentiallÿ procluces a perfectly ordered tree. Hoivever. the orclering
prodiicecl bj- the artificial tree generator is of no concerri since the B R.-\XCHGENERATE routine
will be followed by n B R X N C I I ~ R D E R roiitine n-hose sole purpose is to orclcr the siiccessors in
a more realistic nianner.
In Figure 4.3. tlie artificial tree generutor 11a.s been incorporated into an ALPH.-\BETA searc11
routine. At a leaf node. the mhie of pscore is retrirned as the nocle's score. At al1 other
nodcs. B RANCI- GEN NE RATE is called to generate the nocle's siiccessors. Once the esecution of
BRANCCIGENERATE is cornplete. BRANCHORDER is called to orcIer tlrc successors A possible
BRANCHORDER routine is described in Section 4 - 3 3 .
Note tliat tlie entire tree is clependent on only two values: the Root node's seed and
pscore. In the experiments condiicted, clifferent seeds are rised to generate different trees.
homever, the pscore value is L~eci, The valrie of the pscore variable is fked to the value
(score-,marc - score_min) /2 . If the Root node's pscore is too low. then the range used at the
root? [score-min, pscore], will be too small. Sirnilalx if the \due of pscore is too Iiigh. tlieri
the first successor's range, score. score-mux], will be too smdl. Therefore, a value for pscore
somemhere in the miclclle of the range of scores that are possible oKers tlie greatest fiesibility.
In the esperiments described in this test, two sets of seecls are iised to generate artificial trees.
The sets will be referred to as t g l and tg2 . The first set, tgl, contains 100 seeds while the
second set, tg2. contcuns 200 seeds. Both sets are described in Appendis 1.
1: score,min - O
2: score-maz - random-limit - 1
3 : node-rnd - new random-tgpe
4: R t \ x D o i \ ~ S ~ ~ D ( n o d e . rnd. node-seed)
5: for 2 - 1 to node.child.length
6: - 1 :
8:
9:
10:
Il:
12:
13:
14:
15:
16:
17:
18:
nocle. chi ld[ i] - new node-type
if i = i then
node-child [il .p.score - node .pscore
if node. type = m m t hen
node. child [il. tgpe - min
if i # 1 then
node.chilcl[i].pscore + R ~ ~ ~ o h ~ R . . l ~ ~ ~ ( n o c l e . r n d . score-min, nock.pscor-e)
else
,node. child [il. type +- . m m
if i + 1 then
node.ch.ild[i].pscore - R A N D O R I R X N G E ( . ~ O ~ ~ , . ~ ~ ~ ~ node-pscore. score-rnax)
node .child[il .depth - node-depth - 1
node. child [il .seed + Rxiu DO h,I(.node. n z d )
Figure 42: Generation of successors using the czrtificial tree generator.
, A L P H x B E T A ( ~ o ~ ~ , alpha. beta)
1: if node.depth = O then
2: ret urn node. pscore
3: B R . ~ N C H G E ' I E R ~ \ T E ( ~ O ~ ~ )
4: B R X X C H O R D E R ( ~ O ~ ~ ) - a: if node-type = ,max then
6: score - n1ph.a - : else
8: score + beta
9: for i - 1 to node.child.length
10: if node-type = . m u then
11: value - A ~ ~ t [ . ~ \ B ~ ~ ~ ( n o c l e . c h i l d [ i ] . score. beta)
12: if value 2 betn then
13: return beta
14: if ualuc > score then
1.5: score - unlue
16: else
17: value - ALP t[ AB ETA( node. chhi[i]. alpha score)
18: if .value 5 alpha then
19: return alpha
20: if val t~e < score then
2 1: score - .value
22: return score
Figure 4.3: Incorporating the artificiai tree generator into an alpha-beta search procedure.
CWAPTER 4. ARTIFICIAL GARE TREES
4.3 Modeling Branch Ordering
To mimic tlie branch orclerings ohsemecl in practice? kIcu.slcuicl and CampbeI1 [ l G I propose the
nioclel of a strongty ordered tree wtiere the branches are orclerecl accorcling to two rriles:
O The first branch a t an>- node is the best 70 percent of the time.
0 Ttiere is a 90 percent chance that the branch with the best score is locateci witliin the
first cluarter of tlie branches at a nocle.
In [IS]. to generate a strongIy orclered tree n-ith a hrancliing factor
a weiglit :
of 20. each branch is given
Each weiglit represents the probability of tliat rnokre being cliosen as the best at t hat notle.
Esperirnents mitli the aiithor's chess program. R a 1 . inclicate t liat the inode1 OF
a strongly ordered tree is not an accurate representntion of the brancli orderings observecl
in practice. A new mocleL the exponentïally orciered mode1 is introdiiced to oviircorne the
cleficiericies in the s trongly orclerecl model. IF necessary. c he esporient i d mode1 c m be adj ustecl
so tliat it satisfies t h criteria of tlie strongly ordered moclel.
4.3.1 A Description of RajahX
Esperience witli RajjaliX lias sliowri that tlie program is quite a strong cliess p lqer- It has
competed successfi~LIy in two toiirtiamerits, At the 1996 Dutcli Cornputer Cliess Cliarnpionsliips
the program placeci 13th in a fielcl of 20 participants. After several rnoclifications to the program.
at the 1997 Aegon Man-Machine tournament the program finishecl Et11 in a field of 100. PI-ing
oii a Pentium 166, the program's blitz rating hovers arouricl the 2500 mark on the lnternet Ctiess
Club.
RajahX uses a negczscout search routine 1vit.h nrimerous enliancemerits. At H Ieaf riode. a
quiesceuce searcli routine is used to refine tlie node before it is evaliiated. The structure of the ,.-
tree generated by the program is not iiniforzn. Variations witli rnoves t ha t seern -1nteresting"
are searchecl more deeply in orcler CO uncover any gains or losses that are outside the normal
search clepth. The program uses several methods to irnprove the efficiency OF its se,vcli routine:
O Iterative deepening. - Aspiration searcti.
0 Killer table [16. 1.51: -4 molFe tliat is Found to proclilce a cut-off is storeci in a killer table.
IF the move cornes iip again in another part of the tree. it should be placed near the front
of the rno\-e list since it is likely to produce a cut-off again.
Kis toq list E1.51: A liistory list contains 40% entries (64 x 64j. It contains a n entry for
eacli (front-square. to-square) pair in cliess. Tlie eritry indicates liow often a move from
one square t o anotlier is the best or is able to prodiice a cut-off a t a node. This tabie is
niaintained as the searcli progresses and is used to orcler the t)rcuiches a t ewry iiocle in
tlie tree.
Tra~t.s~osition table [16. 1.51: Tlie score cletermined for cacti position encountemcI cliiring
the search can be stored in a Iiasli table that is accessed rising a key generatecl frorn that
boarcl position. If the position arises again througli a different secluence of rnoves, the
table can immedia te - provicle a score for tlie position ancl search is cornpletely avoiclecl.
.I Nul1 move search [3]: In chess. a --pas" is not a legal move. Hoivevcr. if sricli a more
rvas legal. in most positions. the player on move woiilcl not pcws since the otticr pl--er
could potentially irnprove his position greatly. This observation can bc risecl clriring the
searcli ,as Follosvs. At a nocle in tlic tree. if the player on rnow rnukes a p a s ancl the
resirltirig score is Iiigher tl-ian the upper bound tlien making a. non-pczss move woiiIc1 result
in a significantly liigher score. Therefore. n-lien the score for a pcxs type move is liiglier
tlian the upper bouncl. the node can be CUL-off right aimy since it is likely to contribute
nothing to tlie overall searcti effort.
RajaliX also irnplernents ci. simple learning method [-31. Positions tliat a re founcl to be prob-
lematic For the progarn are storecl in a table ancl the prograrn "leariis" to alroicl thesc positions
mhen the - arise agairi during tree search.
4.3.2 Quality of Branch Ordering in RajahX
To rneasirre the cluality of the First brarich espancled a t eacli node. a rnetliocl sirggested in [IO]
nie~îsures the frcquency witli idiicli the first branch is the best or is able to produce a cut-off.
This frequency is referred to as fbest. The methocl assumes that the brancli that causes the
cut-off is d so tlie best. This assumption procluces a highly optimistic estirnate of the cliiality of
branch orclering because â branch that produces a cut-off at a node is not necessczrily the node's
best. Tliere is a large degree of uncertainty in the ranking of the cut-off branch among the other
branches at the node, because an exact score is unavailable for that branch and several branches
may be unesploreci when the cut-off occurs. However: fbeS t still provides useful information
abolit the tree structure. Specifically, fhesl can be tised <as a criterion for comparing the trees
prodiiced by RajahX nrit h the trees produced by the artificid tree generator.
To compute tlie value of fhest for RajahS- a minor program moclification ww necessar_v.
The negascout search algorithm is replacecl by the sirnpler alpha-bcta ulgorit hm- Alost of tlie
tirne. the negascout algorithm searches trith a nidl IL-indon*. Cut-offs are niore frecluerit in u nul1
wiriclow search and the greater frecluency of cnt-offs will skew the valire of fhhest even f~~rther .
Statistics are collectecl while RajaliX concliicts searches on the Bratko-Kopec set of 24 test
positions [ 1 3 ] . The progrcm is askecl to search each position for 10 seconcls, Tables 4 .1 . 4.3
and 4.3 summarize the statistics collectecl. Every nocle in the tree cloes not contribute to the
computation of fhes t . For esample. at a masïmizing node. if every branch retiirns a score
eqiral to the lower borind then it is not clear n-hich branch is the best. Ft~rtherrnore. a cut-
off never occurs at such a node. The second coltirnn in Table 4.1 gives the number of nocles
that contribute to the computation of Pbes t . To determine the average braricliing factor. the
number of pseudo-legal moves at everÿ node that contributes to jbeSt is totalecl. For efficiency
reasons. most chess programs generate pseudo-legd moves- The set of moves generated is
described as being pseuclo-legal sirice the set may inclucle illegal moves n-hich Ieave the king
in check. hloves that are illegal are cletectecl and rernovecl diiring the search proccss. The
ntimber of pserrclo-legal moves is a goocl estirnate of the average branching factor since itlegal
moves are not generated for most positions. However. tliis estimate is sligtitly higlier than the
real brancliing factor clne to the iiiclusion of the illcgal mo\es. The thircl coliimn in Table 4.1
indicates the total nuinber of pseuclo-lcg,d mob-es for eacli test position. Usirig the data in the
second ancl ttiird columns. the average branching factor is clcterniineci to be approsimatcly 33-
An espancled version of the origirial methocl in [IO] ivas implerriented to obtain a more cletailecl
view of the tree structiirc. The espariclecl version calculates the frecluency wit h wllicli a more
in one of the first 10 positions is tlie best or is a b k to produce a cut-off. This produces a set OF
frequericies: fhest (1). fbesL(c)) . . . - . fiest (10). Table 4.2 prescrits the first five values of frIesL anci
Table 4 .3 presents the li=t five. The term. phcsl jn). refers to the probability witli wliich a rnove
in a position, n. is the best ancl is obtained by dividiiig fbesL(n) by the nrirnber of nocies tliat
contributecl to fhest (n) (Table 4.1) .
For the trees searched by RajaIiX. tlie first branch hczs a very iiigh probabiiity (pas t (1) =
0.875521) of being the best or being able to procIrice a cut-off. On a perfectly orclered game tree
phest ( 1 ) = 100. The branch ordering tecliniques i~scd by RajahX help it achie\-e sometliing quite
close to a perfectly ordered tree. In the nest section. the challenge is to reproduce the behavior
observecl in RajahX using ûrtificiailÿ generated trees. If tlic artificially generated trees <are to
be used <as a mode1 of the trees encountered in practice tliey shoiild produce similcar values for
phes.! (11, ~ b e s t (3). - - - ~ b e s t ( 1 0 ) -
Position i
- Total
-4 .ve rage
Branching
Factor
Pseudo- Legal
Moues
Table 4.1: A coont of the nodes tliat contribute to fbs1 and average branching factor in the
Brcztko-Kopec test positions.
Position
Total
Frequenq With bViich
Moue is Best or
Produces a Ct~t-off
fhest (4)
5
1 1383
128.5
13 10
1209
1113
116.5
2331
1369
9S.5
l6GO
1427
1615
2032
937
13 12
1081
1135
1267
2869
571
418
1961
390
131261
3 .O 14957
fhest (-5)
Table 4.3: The first five values of fbmt derived from the scarch of the Bratko-Kopec Lest posi-
tions*
Freqztency With LVEich
Moue is Best or
Produces a Cut-08
f hest ( 7 ) fhest (9) Position
Total
Table 4.3: Tlie last five values of fbest derived from the searcli of the Bratko-Kopec test positions.
4-3.3 An Exponential Branch Ordering Mode1
Imagine a f~inction whose sole purpose is to orcler the branches at a nocle in a best to worst
orcter. Althorigh s~ich perfect orclering is impossible. the frinct ion rnakes an eciucated at tempt
a t creating a best to worst order. Tliere are two branch lists: one is rinorderecl ancl the other is
orclerecl. InitiaIl- the unordered Iist is frrU ancl lias 6 (branching factor) mernbers. The orclerecl
list is ernpty. UntiI t lie unorclerecl list is e m p t - tlie fiinction moves branclies. one at a tirne. from
the unorclerecl list into the orderecl Iist. Branches must be carefully selcctecl from the unorderect
list since they are being moved into an orclered k t . To aicl the selection process, the f~rriction
uses domain depenclent knowledge to predict which of the rernaining moves in the unorciered
list is the best- In the eqonential brunch ordering rnoclel. t ~ o weights. tu, and wh. represent
the effect of tlie cloniain clependent knowledge on the resriltant ordering. When selecting the
first mo\-e. the knowlecige assistecl selection meclianism is s~iccessfiil in picking the best move
u-ith a probabi1it'- of w,/100. For ail other moves. the knowleclge assistecl selection mechanisni
is successfiil in picking the best rernaining move from the unorclerecl list \vitil a probability of
w b / l O O . !\%en the clornain dependent knowledge is not good enough to select the nest best
move. tlie probability that the move pickecl is stiil the best is given by: 1 / ( b - n + 1). This is
the probability whcn n simple ranclom pick is irsecl to select the nth branch. The routine in
Figure 4.4 proci~ices an esponential branch orclering. Both the orcferecl ancl rinorclered list are
storecl wi t liin the node. child array. \\%en selecting the first niove. the orcierecl list is eniptj-
ancl the entire arr+- stores the elements of the unorclerecl List. iL7ien sclccting brancti n. the
elements in the range [l. R - l] are members of the orderecl list ancl elerrients in the range In. b]
are meinbers of the unorderecl List.
An esponenticzl hranch orclering lias the following properties:
The orclering generatecl is controlled by three parameters: b. LU,, sirid wh. Parameter.
6. represents the branching factor. Parameters. w, and wb. represent the effect of the
knowledge component in the ordericig Eiinctioii culcl are values in the range [O. 1001. In
Figure 4.4. the cocle in the first patli (lines 8 - 29) selects the branch witli the best score
among tlie rernaining branches. wliile the second pat h (Iine 21) selects one of the remaining
branches at random. The selecteci brancli is tlien placed in position i. Parameters. w,
and w b t are usecl to cletermirie whicli of tlie two p a t l ~ s is taken.
a The first bruich üt ci iiucle is bsst witli probability, p p given bu:
0 A brancli at position n: nrhere 2 5 n 5 6: is best with probability. p,? $ven by:
Branch,
Position
Branch is Best or
Produces a Cut-08
Table 4.4: \/allies of Jbesl and phest clerived from the search of LOO ar t i f ic idy generatecl trces.
We now acldress the clucstion of wllether an esponentialIq- orclereci moclel is a n adequate rep-
resentation of tile trees encountered in practice. In otlier worcls. can un esponentially orclerecl
tree procluce the same values for fflesL micl phest as the trees searcl-recl bj- R a j a l X ? Table 4.4
illustrates the vdiies of fbeAt a n d phest collccted during an alplia-beta searcli of 100 ~zrtificial1~-
generated trees (generateci using set tgl). Each tree wcîs searchecl to a clep th of 7- Tlie esponen-
tial orcleriiig erriploycd by tlre artificial tree geiierator irsed the paranietcrs: b = 32. w, = 79 and
wb = 5 . Figure 4.5 compares the probabilitj- curve procluced by RajahX (generatecl iising TabIes
4.2 ancl 4.3) with the probnbilit5- curve produced by t h artificial tree generator. Clearly. when
the appropriate parameters a r e usecl. the trees produceci bj- an esponential ordering closely
resernble the trees searchecl by R+jnhX.
The esponential mociel can satisfy the strongly orclerecl model's criteria when suppliecl \vit 11
[lie right parameters. X strongly orderecl tree with a branching factor of 20 is procluced when
the parameters. b = 20. wu = 69 u i d wb = 19. a re used. T h e data in Table 4.5 w u generated
iising these parameters- The second column is the resiilt of applying Equations 4.1 anci 4.3 for
various values oE n: this colurnn presents the probability wit h mtlich a move in n position. n. is
the best. A running t o t d of t h e probabilities in the second column is maintained in the tliircl
column: this colilmn presents t he probability with which a move in one of the first n positions
is the best. The probability of the first move being the best is 0.705500. This is slightly higlier
than the 0.70 required by the strongly ordered model. The best branch is mithin the first
cluarter of the branches a t a node witli a probability of 0.899916. This figue is slightly lower
B R . - L N C ~ [ ~ R D E R ( ~ L O ~ ~ )
L: for i - I to node.child.length
2:
:3:
4:
-5:
6: - ( 1
8:
9:
10:
Il:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
32:
if i = 1 then
tu, - tu,
else
LU, - Wb w - R = l ~ ~ ~ h t R ~ ~ ~ ~ ( n o d e . r n d . 0.99)
if w < lu, then
if node-type = maz then
u - score-min - 1
for j - i to node.child.length
i f nocle.chilcl~].pscore > v then
P-+
u - node .child bl -pscore
else
v - score-muz + 1
for j - i to node.chilS.lengfh
if n ~ d e . c h i l c l ~ ] . ~ s c o r e < u then
P - 3 ' v - node .çhiEdb] .pscore
else
p - R ~ w o o b r R ~ ~ ~ ~ ~ ( n o d e . r n d . i . node.child.length)
swap node-child [il .pscore - node .child [pl-pscore
F i 4-4: A roiitine tliat procllices an esponential branch ordering.
Brancii Position
Figue 4.5: Comparing the values of p h , , From Rajah>( with the values of ph,,, Froiri artificially
generated trees.
Table 4.5: ,411 exponential orclering witli pxarneters. b = 20. w, = 69 ancl wf, = 19.
than t.he recluirecl value of 0.90. Higlily precise values for .lu, ancl .wb c m be iisecl to pro-duce the
esact valiles rccliiirecl by the strongly ordered moclel. but the routine in Figure 4.4 u s e s integer
arithmetic and it only accepts integral values in the range [O1 1001 for .w, ancl u b . ThGs is only
8 minor limitation and it does not have a significant impact on the performance esp~erirnerits
to be clescribed in Chapter 5 ancl 6.
The performance esperiments in tliis work will consider fiw different brandi orderings as
generatect by the following parameter sets:
I 1 - IO 20
Brancli Posi~ion
Figure 4.6: Tlie probability curves generated by the rive parameter sets usecl in the esperinients.
The first parameter set generates trees that resemble tlie trees searclied by RajallX. wllile the
second generates strotigly orclered trees. The last three parameter sets are clesigned to illustrate
the effect of clecreasing branch ordering accuracy The fiftli paraineter set generutes trees tliat
are ranclonily ordered- Ecluations 4.1 and 4.2 can be risecl to compute the probability ciirvcs
for the five branch orderings tlie resulting curves are plotted in Figure 4.6.
Chapter 5
Neural Network Based Prediction
5.1 Introduction
In n paralle1 aiplia-beta search. a split-point must be chosen carefully if large increctses in search
overhead are to be 2x1-oided. If a node that will be cut-off is selected as a split-point. useless
work i d 1 be perforniecl when many processors collaborate on the nocle- On the other hancl. a
node that esamines al1 of its successors makes a pcrfect split-point. 111 xlclition. for efficiency
reasons. a iiode should bc selected as a split-point only wlien it is reasonabl_v certan tliat its
score will not change f~irther.
Civen n large set of training data. a simple feecI forwarcl nciiral network ['il cari be taiiglit to
approsimate a wicle varietu of F~inctions. For the piirposes of a parallel alplia-beta searcher. a
feed-forwarcl ncurd network c m be tauglit to prcdict rvlien a nocle's score is going to stabilize
ancl svlietlier a nocle is going to ciit-off. .A neural rietwork that is capable of perforrriirig these
tsvo tcxks is prcserited here. its application to pczrallcl alpha-betâ seczrch is esamineci irsing botli
secpential ancl parallel esperirnents.
5.2 Neural Network Inputs and Outputs
T h e neural rietwork is callecl as a new cliild node is being generated. Three inputs are rcqiiirecl
by the network:
pv: The value of tliis input is 1 if the parent nocle is among the first nocles to be espcinclecl
by the search (Le. a PV node). It is O otherwise.
type: This input represerits the parent node's type ancl it is a value in the range [O. 1001.
index: The position of the chilcl being espancled among the other chilclren at this node.
T h e network produces two oiitputs:
type: Represents the chiid nocte's type and i t is a value in the range [O. 1001.
0 wait: This output represents the number of rnoves that need to be searchecl before the
score at the cliild node stabilizes-
Strong YBWC (Section 13-41 clczssifies a nocle into one of three types. Furthermore. in YB\;t7C'.
$\.en a node and its tjpe. one can determine the type for the nocie's cliildren as nrell. The
netn-ork presented tiere performs a similar tzzsk: given the parent's type. the netn-ork generates
the child's type- Note that a node's tlvpe m l h e one of 101 ciiffcrerit types. This generality is
necessary to allow tlie neural net to cliscover irrteres t ing patterns in the training clata.
A paralle1 alpha-beta searclier can tvait unt i l the number of trioves specificcl by the oiitpiit.
iuait. is searcliecl sec~iientidly before trying pxrullel searcli. How?ver, iising tlie oricput. type.
is slightly more involveci. If the vdue for a node's tgpe Falls within a range of niimbers that
are knox-n to represent cut-off nodes, estra caxe can be used before the nocle is selectecl as a
spIit-point or tlie node can be avoided al1 toget lier.
5.3 Generating Training Data
-4 single piece of training clata (also referrecl to as n pattern) consists of a group of input values
and the espected output values for the given inputs- Generating training data is ci. three stcp
process. First. a basic secluential alpha-beta se;lrch is modifieci to generate the following data
at every nocle:
( p l . P.). . . . . I I r L - 1 ) : S pecifies the ~adclress" of the nocle in the tree. Herc. n is eqiial to the
heiglit of the tree bcing searcliecl. Each p value specifies the move that was selected ut the
trce level indicated bj- the subscript. For csample. the node labeled B in Figure 5.1 lias
the aclciress: (1.2.2). FVlien adclressing a xocle. if a particular lewl has not been reached
as yet. the corresponcling p value is set t a O. For esample, the node labeled A hi= the
zdclress: (2.1,O).
t: A type value czssociated with the node, If a cut-off occurs at the node. then its type
value is 50. If a riocle updates its score b u t does not cut-off, then its type value is O. If a
node cloes not upclate its score and does mot cut-offt then its type value is 100.
u: The incles of the inove a t which the Enal score update occurred or the indes of the
move that produced a cut-off.
Valires for ( p l : p, . . . . prL-Z)? t and u are collected as the 200 trees from set tg2 are searctied
to a ciepth of 4, Since the data is collected by traversing several different trees. tliere mq- be
Figure S. 1: Nocle aciclressing.
severai different values of t and u for a nocle adclress (i. j. k). This problem is rectified in the
second step b - corrlpiiting the average \ d u e of t and u for each nocle aclclress in the tree. The
thircl ancl final step clefines a mapping between adjacent levels in the tree. Let u s consicler how
a mappirig ma+- be formed betn-een the second ancl third l e d s in the tree. ..A node a t clie secoricl
le\-el of the tree lias an acldress of the form: (i. O. O) . i # O. T h e average vdiies of t ancl u â t this
nocle are denoted taVg (i. O. O) ancl L L , ~ . ~ (i. 0.0). respect ivelj-. -A nocle ac the tliird levd of the t ree
1123s <an aclclress of the form: (i. j. O). i # O. j j O. A mapping tjetween a parent a t the second
level ancl a clrilcl a t the thircl level can be defined as follows:
This mapping is coniputecl for every possible vdiie of i and j. The same process is appliecl to
clefine a mapping between the nocles at the first and second levels ancl between the nocles a t
the thircl and Forirth levels.
The data generatecl by the three-step process just clescribed can be usecl as training da ta for
a neural network. The tiuk of the neural network is to discover the fiinction tliat maps a pair
of valries at a parent into a pair of mlues at the chilcl. For additionid accuracy. a booleari value.
p u . is ais0 used as input to the function. This boolecm value specifies whetlier the parent node
is czrnong the first nodes to be espanded by the searcii. Using the notation of Section 5.2. the
task of the neural network is t o discover the rnapping that procluces ( t g p e . wait) an o u t p ~ i t
given ( p u , t gpe , index) =as cm input.
Five sets of training da ta a re generated, one for each of the five ordering types tha t a re
considered in this test (see Section 4.3.3).
CEIAPTER 5 . NEURAL NETWORK BASED PREDICTION
5.4 Neural Network Structure and Back-Propagation
T h e striictiire of the neural network usecl is shown in Figure 5-2. Tt consists of three layers: <an
input laver (il- i2. i3.
nodes. ih ancl hb. are
netirons in the input
ib) a hiclden layer (hi: h2, . , . . hpt hXb) and an output layer (oi, 07). T h e
hius neurons which always output 1. Escept for the bias neurons and the
layer. the output of any other neuron is $\*en by the following eqiiation:
Figure 5-3 provicles a plot of f (13) for -6 < 3 5 6. IF ,3 is a large negati\-e value. the frrnction
returns a value verÿ close to O. However. if 3 is a large positive v,diie. the fiinction retrirns a
\-due close to 1. For the Il.-th neuron in the liiclden la>-er. 3 is given by:
Similady. the value of 3 for the k-th riairon in the output layer is gïven bj-:
A weight value. wh or UO. is associateci with each input to a neiiron ancl it is usecl to scale the
input before it is sunimeci. Note that each neuron I i L a s its OF\-n set of weigits - one neurones
set of weights is not necessari1~- the s a n e as anot lier neuron's. The weights. Z U ~ ~ J , and iuok.6.
are iisecl to scale the bias input.
T h e output OF s neuron is n value between O and 1- In keepirg with this convention. the
inputs to the neiiral net are scaleci a1ic1 sliifted to fit the range [0.1.0.9] before being placecl in
the input layer, Scalecl and shifted versions of pz!. tgpe anci index are placecl in neurons i l i2
and i3 respectivel- Sirnilarly. the values in the output li1v~ï are scaleci and sliiftect from the
range (O. 1.0.91 into the range recliiireci by tlie output variables.
To train the neural network. a set of input values is applied to tlie network. Tlie network's
outputs are comparecl to the espected oiitputs t o determine the error tssociatecl mith the
outputs- The error associatecl witii the k-th output neuron is given by:
Here. ok(p) represents the output of the k-th output neuron for the input pattern p ancl d k ( p )
reprtsents the desirecl output for the k-th output neuron. Error in the neural network output
cari be reducecl by applying the back-propagation method [ i l to adjirst the wiglits in the hickien
and output layers. Tlie acljiistment to the weights at the k-th neiiron in the output li~yer is
given by:
Aw0kvj = vn/k(p)hj (p) (5-6)
Figure .5.2: Neural network structure.
Figwe 5.3: The thresliold funcrion ~ised for euch neuron in the neural network. f(3) = -.
Here. r! is the learning rate and y k ( p ) is given by:
Sitnilarly the weiglits for the k-tli rieuron in the hiclclen l a y r are aclji~stecl irsing:
Although Sk(p) and -ik(p) play similar roles, the computation of S k ( ~ ) is more involved:
Table 5.1: Average sqi~arect error for the firial neural network obtainecl &er training.
Ordering (b..ur,? w b )
(32.79.5)
A training iteration consists of feeding every training input to the network. Ariy error in t lie
outpirt is used to adjust the network n-eights. -4s the iterations progress. the average scluarecl
error in each output is rncasured. The average sclriared error for output X: is giveri by:
Here, P is the set of al1 training patterns. One rieural network was createcl for every t ipe of
ordering coiisiderecl. Eacli network was trained over 10000 iterations ancl tlie Learning rate. q.
iras set to 0.1 cluring the training. Table 5.1 sumrnarizes the average sqriarecl error for tlie final
network obtainecl after t raïning.
Etw 1 Ewit
5.5 Performance of Node Classification Schernes
37-460133
Consicler threc different scliernes tliat predict wliether a node nrill cut-off or will esamirie al1
its successors. The first scheme, wliicli will be referred to as sclierne A. is taken From strong
YBWC. X nocle in tiiis sclierne is c1:usifieci cas'follo~vs (Section 3.4):
0.633316
O The root uocle is OF type Y-PV. The first successor at a Y-PV node is of type Y-PV. while
tlie rest of the successors are 'i'-CUT. - The fi rst successor to a, Y-CUT nocle is a Y-ALL node. wliile the rest are Y-CUT nocles.
w Al1 successors at Y-ALL nodes are Y-CUS nocles.
The second scheme, scheme B, is derived from the description of the DTS algorithm (Sec-
tion 3-5):
0 A nocle that h a the same alpha cmd beta vdues as the Root is classifieci as D-PV.
A minimizing node with the same beta LZS t h Root or a maxïmizing node with the same
alpha as the Root is classifieci as D-CUT.
Any nocle that doesn't fit the first two categories is clcassifiecl as a D-ALL node.
If more than some number of moves. n. have been searched at a D-CUT node without a n
occurrence of a cut-off. then the type of the node is changecl to D-ALL.
Two D-ALL nocles are allouecl a t subsequent levels in the tree exact ly once. Folloiving tlie
two D-ALL nodes. if tliey esist. noclcs at sribsec~iient levels are forced into an alternating
seqrience of D-CUT and D-ALL nodes.
Both scheme A anci scheme B have types thut corresponcl to the ALL anci CUT types in a
perfectly orclered game tree. Atliougli P V nodes are naniecl clifferently in eacli scherne. a node
that woiilcl b e clsusifiecl l'--Plr in schemc .A woirlcl be classifiecl as D-PV in sclierne B. Each
scherne consiclers its PV cjpe to be tlie first secpiencc of nocles espanclecl.
In scheme B a nocle maj- change froni beirig a D-CUT nock to a D-..-ILL nocle. In tliis case.
only the initial prccliction is iised in determining performance. Note that wlien the D-C'UT
node changes. it affects the chssification of nodes tliat are below it in the trce.
The final scheme. sclierne C. is an applicatiori of the nerird networks tliat were clescribecl
in tlie preceding sectiori. In tliis scherne. a nocle may be classifiecl as beitig one of 101 types.
The neural network generates a type vdirc for a chilcl nocle g-iven the parent's type valiie ancl
the locacioii of the chilcl amorig the parent's other chilclren. IF the type valiic for a riocle fdis
witliin a range [5O - rrt. 50 + rn.] centerecl arouncl 50. the node is clcfiriecl to lx? i i r i N-CUT riocle.
If the type Falls oiitsicle this range. the riodc is saicl to be an N--4LL nocle. If a nocle is arnong
the first nocles to be expanclecl by the search. the neural network oirtpiit is not useci and tlie
nodc is definecl to be an N-PV nocle.
To conipare the perforniance of sclienies A. B and C. each schcme was used to ccitegorize
the interna1 nocles of 100 artificially generatecl trees. In each tree. tlic first secllience of nocles
espancled (PV riodes) arc not iricliided in the performance measure becarise al1 tlirec sclienics
gcnerate the same type for tliese nocles. The trees were genercitecl using set t g l ancl were
searchecl to a deptli of 6. Tables 5.2. 5.3 ancl 5.4 surnrnczrize the results of tlic espcrirnents.
In each table. the TmeCuT column indicates how ofteri the precliction iws correct for a CUT
b p e node (Y-CUT. D-CUT or N-CUT). In other words. this coliirnn inclicates how often a
CUS type node was Founcl t o cut-off mithout having searclied al1 of its branches. Similarl_v, the
T7-ue-LlLL column inclicates how often an ALL type node (Y-ALL. D-ALL or N-XLL) searchecl
al1 of its branches. On the otlier licznd. the FalseCuT columri indicates the niimber of CUT
nodes that dicl not cut-off and the F a l ~ e - . ~ ~ ~ column indicates the nimber of ALL nodes that
did not search al1 of their branches. The Error column is obtained by summing the number of
mispredictecl CUT nodes and the nrimber of mispreclicted ALL nodes. When using scherne B.
a value must be specified for parameter n. Recall that n is the number of moves after whicli a
Table .5.2: Performzuice of Scheme -4.
D-CUT nocle changes into u D-XLL nocle. The seconcl colirrnn in Table 5.3 spccifies the valire
of n that [vas fonncl to gi\-e the l o ~ e s t precliction error for a gït-en orclcring. -1s in sclierne B.
sclienie 1 recluires a \-dire for parcuneter m. Recall that ttiis parameter defines a range of type
values [SO - m. 50 + ml that are usecl to check if a nocte is of the Ct iS twe. The scconcl columii
in Table 5-4 specifies the value of rn that ivCas founcl to give the Ion-est prcdiction error.
The lowest error. regardless of branch ordering. is eshibitecl by scherne C. Schcrne A. n-liicli
is basecl on strong YBIVC. l ias the nest highest error. The worst perforniance is exhibitecl bu
scheme B. Sclienie B obtains its best resiilts with n = 1. IVlien n = 1. sclieirie B eiiip10j-s a
rulc tliat is cririte similar to a rule used by schcme A, Ifiith n = 1, when one successor has been
esaminecl at a D-CUT node witlioirt an occurrence of a cut-off. the nocle's type is cliangecl to
D-ALL- A successor to tliis nen' D-ALL node will be of the D-CUT type if the rule regarding
two consecutive D-ALL nodes is rrscd- Note the similarity bet~veen this and scheme A's rule for
Y-CUT nocles. In scherne A. the first successor to a Y-CUT node is of type Y-ALL wliile tiic
rest are of type Y-CUT.
In Table 5.4. note that the last row hm two entries that are 0. This rneans tliat for ordering
(32.0.0) sclierne C does not use the N-ALL classification at alI! Consicler what huppens iri
schcme A for ordering (32.0.0). Scheme A phces 2227973 nodes into thc Y-ALI, category.
However. orily 971304 were founcl to be trire Y-ALL nocies. Sctierne C takes the approach of
eliminat ing the N-ALL category a11 togetlier. thereby elirninating the problern of inaccurate
classification. Altliough this I n y seem siniplistic. this is a safer approach for a parallel alpha-
b e t ~ searcher since a CUT node is subject to more stringent rirles before it is picked iu a site
for parallel search.
F i g ~ r e 5.4 presents 5 bar graphs conipûring the performance of schemes A. B and C. Scherne
B hm a significmtly higher error than either scheme A or C for al1 orderings considerecl. The
perf~rrn~uices of schemes A and Ci are relatively close to each other escept for those orderings
where wb is of reasonczble size. For esample. when the ordering (32.66.66) is considered. scheme
Table 5-3: Performance of Scheme B.
Table 5.4: Performance of Scheme C.
Error x 10.'
1.5 Error
10 x 10e5 -
.3
O
30
2 O Error x IO" 10
O A B C
Orcleririg i20.69.19)
A B C Ordering ( 3 2 . o. O)
Figure 5-4: Cornpuring the performance of schemes -4. B ancl C.
A prodr.ices an error \-due of 149611 wliile sclierrie C procluces an error value of 314338. In this
case. sclieme C produced an error val~ie that is over 40 percent lori-er tlian tliat producecl b>-
scheme A. In Figure 5.5. the improverrient of C over A is plottecl against the value of ~q, tliat
produced it. For tiiglier values of wb. scheme C dominates scheme A I>- a significiint rnargin.
Iniprok-ement of C' over X in 7% 20
F i 5 : Comparing the performance iniprovemcnt of scheme C over sclieme -4 For various
values of w b .
Chapter 6
Experiments on a Parallel
Alpha-Beta Simulator
6.1 Introduction
In Section 5.5. tlie performance of thrce different riocle classification schemes [\-LW corrsiclerecI
within the fr,urlework of a seclirentinl alplia-beta searcher. Were. the tliree scliemes frorri tliat
section are estc-nclecl to create tliree split-point selection schemes n-hosc performance is stirclied
iising a parallel alpha-beta simulator. The pnrdlel alpha-bcta si~tiiilator (PABSini) is a partial
irnplernentation of the DTS rnetliocl clescribecl in Section 3 5 on a simplifieci sharccl mernory
multiprocessor systern. The irnplernentation is partial in the sense tliat ttierc is no split-point
sclection riieclia~iistn iri P-ABSiin. It is t lie resporisibility of the user to specify the details of the
split-point selection mechanism. This hcility will be used to compare the performance of the
tliree split-point seIection scliemes.
6.2 A Simplified Shared Memory Multiprocessor
The simplifieci multiprocessor that is considered here bas 4s processors. ,An illiistratiori of the
system is presented in Figure 6.1. Each processor ficas a portion of the total nicrnory in the
system. A cache is attacheci to eacli processor primarily to providc fCwt access to bot11 local ancl
remoce mernorjr Each cache line is 128 bytes wide. Memory in the system is cache colierent and
coherericy is ensurecl using the 3-state LISI protocol in the caches dong with a directory protocol
[2] in the mernory units. Details of how the processors cire interconuected and the nature of the
intercorinection medium does not have a significant impact on the parallel alpha-beta searclier
that is going to be described thus this issue will not be considered further. Hosvever: the timing
of the various types of mernory accesses is of p e a t significance and this topic is treated nest.
--
Processor lnterconnect i I I
Cache X Figure 6.1: The simplifiecl sliarecl memory rnultiprocessor systeni.
Timing in the simplifiecl in~iltiprocessor systcin is espressed in ticks. The \ d u e of a tick is
not important as long as al1 timing in tlie system is espressed in ticks. -4 tick is clefiiiect to be
tlie lengtli of time it takes for a block of rnemory (caclie Iine) CO return froni the local niemory
1rnit.
A bloclc in tlic cache ma>- be in one of three states: moclifiecl (AI). sharecl (S) or invalid (1).
Furtherrnore. to etisure cache colierency. tlie home node lias a fiag inclicating rvlletlier a block
is clean or dirty dong witli a List of nocles that have a copv of the block. T h c clelaj-s associnted
with reading a block that is in one of several possible states is given in Table 6.1. For esample,
wben reading a block tliat is either in state 31 or statc S. the reacl cari be performecl clirectly
from the cache ,and cliere is virtiially no clc1a.y. If the block is in state I and is flagged as bcing
clean at the home node. it cari bc rend in 1 or 2 ticks. Tlie reacl takes 1 tick if the block is local
and 2 ticks if the block is froni a remote nocle. Horvever. if the block is Aaggecl as being clirty.
the block Iias to be read from tlic nocle that niodifiecl t h block ancl extra clelq-s arc incurrecl.
A set of delays for block 11-rites is preserited iii Table 6.2. If a block is in state AI in the caclie.
it can be modifiecl witliout any clelay. However if a block is in state S or state 1. invalidations
have to be sent to a11 processors tliat liwe 21, caclied copy of the block.
The cache in the sirnplified rnultiprocessor is infinite in s ix . This simplification nieans that
capacity ancl conflict misses do not occur in the caches. However, this z,issuniption prochces
rcs~ilts t h are clrastica~lj- cliffercnt from reality if the prog-rcirn being studied 1m.s a large workixig
set. Fortunately. in the c~îse of an alpha-beta seurch routine. the working set is quite small. A
single block is usually enough to store al1 the data at a node. For exunple. a ctiess mot-c is
represented using 2 bytes of memorÿ: 6 bits are used to represent the square that the piece is
rnollng from. another G bits are iisect to represent the square that the piece is moving to and
the remaining 4 bits are used as Aags. On average. a chess position lias 32 moves available.
thus 64 bytes are enough to represent the move list at a node. A few extra bytes are reqiiired
to Iiold the value of alpha. beta and any other temporary variables. If the heiglit of the tree is
G ancl the data at any node can be stored in a single block of memory. 6 blocks are enoiigh to
- - - - - - -
Dirty/Clean Locai/Remote
C lean Local
[ Cach,e State I
Clean 1 Remote 1 ? !
Dirty / Local / 1
H o m e State
Dirty / Remote 1 3
Table 6.1: Delay for 2-1 niernory read under varioris conditions in the simple m~iItiprocessor~
Home Node
Cache State
(11, t /S/l)
Delay
Home State
(Dirtg,/Clean)
Dirty/Clean
C lem
Clean
Ckan
Dirty
Dirty
Local/Remo te
Local
Remo t e
Remote
Local
Remote
Delay
( t icks)
Table 6.2: Delay For a rriemory write iincler various conclitions in the simple niultiprocessor.
store t h e da ta for the nocles alorig tlie pach Frorn t h e root to the leaf OF the tree. Tlierefore. the
working set is no more tlian G blocks.
6.3 Multiprocessor Simulation
Nornially a mdtiprocessor sirnulator ancl the program being run o n the simulator are two sep-
arate entities. However: tlie approach that is taken in PABSim is to embed the mriltiprocessor
simulation withiri the code that perfonils t lie parallel alplia-beta search. Before cletailing the
simulation process itself, a description of fibers [19] is $\-en. Fibers are of priniary importance
to the simrilation process because tliey provide the mechanism by wliich several processors are
simulatecl on a u~iiprocessor sus te~ri.
Most operating systems tochy provide support for threacls. A process running on such an
operating system may create threads that are then schedulecl by the operating system. Al1
threacls have access to the dam within the process. The Win32 subsystem that is foir~id in
h,Iicrosoft's Windows 98 cand Windows YT operating sp t ems provicies support for threads.
Stsxting with version 4.0 of the subsystem. support is nlso provided for un entitl referrecl to
as a fiber. Fibers are similar to tlireads escept ttiat they are scheclulecl 1- the user ratlier
than the operating system. In a rnultiprocessor simulation. a fiber can bc used to simulate the
activity of a processor in the system. Fibers are more natural tlism threads For a mtiltiprocessor
siindation because precise control is reqriirecl over the process of selecting whicli processor to
simula te nest .
Only three 1irin32 frinctioris are rieeclecl to perform tlie basic fiber operatioris. -4 fiber can be
crecitecl witli a call to CREATEFEBER. The Ftinction takes three arguments: a \-due For the size
of the stack rised for tlie fiber, a pointer to a. function where the fiber begiris esecution ancl a
p~lrameter to be passecl to the f~inction where esecution begins. The f~inction returns a hanclle
to the newly created fiber. During cleanup. a fiber can be cleleted nith a call to DELETEFIBER.
A fiber that is esecuting can suspend itself ancl ccm scl-iedule another fiber for execution with a
c d to SWITCHTOF[BER. The f~mction takes the lianclle of the fiber that is to be schechileci as
un argument -
6.3.2 Simulation Support Routines
Five routines arc clefined to lielp create multiprocessor siniulations. The first F~inction. YLELD.
advances tlie sirniilation tirne for tlie fiber tliat caUs tlie function sincl allows tlie esecution of
other fibers that are waiting to riin. Sctiecliilirig of the other fibers is clone in a roiinci robin
fcîshion. Tlisit is. if fiber 1 cnlls the YEELD Fiinctioii. iiber 1 is suspericlecl a d fiber 2 is esecuteci.
1x1 order to rnoclel siccesses to niernory blocks. tmo functions are providecl: CREAD aiid C 'WR~TE.
X recul request to a meniorÿ block c m be sirniilated iising a call to CREAD aricl a write recluest
to a memory biock can be simulatecl using a call to CWRITE. Both furictions properly mode1
the state transitions for the block in the cache ancl the home nocle. The functions also account
for the clelays s.rssociatec1 with reacling ancl writing a block as indicated in Tables 6.1 ancl 6.2-
For esample, if a fiber esecutes the CREAD function to access a block that is set to state I in
the cache. there nritl be cle1a.y~ in accessing the block. \Vhen such del--s occur inside CREAD or
CWRITE. a call is made to Y ~ E L D to advcmce the simulation time for the fiber. Furthermore. the
cal1 to YIELD suspends the fiber that initiated the read ancl anotlier fiber is allowed to run. Bath
CREAD ancl CWRITE nccount for memory block contention- If one processor reqiiests access
to a memory block that is biisy serving another processor's request. the request is rejected
and the processor that initiated the request will have to t ry again. Although CREAD and
CWRITE account for delays due to network traversal. they do not account for clelays due to
netxvork contention. Recall that the network connecting the processors w,as left unspecifiecl
in Section 6.2. Shere may be delays due to network contention clepending on the network
structure riseci. Both CREXD cuid C ~ ~ R I T E <assume that they ccui access the network soon as
a mernory request recluires communication with a remote node. The I:ut two support routines
,dow the creation of regions tIiat are accessible to one fiber at a t h e . A lock is essentially
a block of niemory that is niociifiecl atornicnlly. .A test-and-set operation is usecl to czseate the
lock entry function. LOCKENTER. The converse of the lock entry function is the LOCKLEAVE
functioti that is used to release a lock.
To illustrate Liow these siniulation support routines are used. a simple matrk nirihiplication
esample is esaminecl ncst. The routine in Figure 6.3 multiplies two 64 x 64 matrices. a and
b. to prodrice a thircl rnatrix. c. It is assumecl that four processors are eseciitirig t h e roiitine
illustrateci wïth each processor's id (O. 1. 2 or 3) storecl in t id. The rom in matriv c a r e dii-ided
ei-enb- among the four processors. Each processor is responsible for cletermiriing t h e valiies
to be pIacecI in the ron-s assigneci to it. ,411 tliree matrices are sliarecl bj- the four processors
perforniing the coniputation. Now. cousicler risirig the simulation support routines to : simulate
the eseciition of the matris multiplication routine on the simple m~iItiproccssor system, In the
simulation. the matris rnultipIication routine is esecuted by four fibers. Each fiber esecutes the
routine illustratecl in Figure 6.3. The variables. r. c. i and t . fit into the same block of niemory
and are Local to each processor. The array. BlockT. contains the blocks wliere these variables
arc storecl. There is an entry in the ctrray corresponcling to each processor in the systcrn. Since
each of tliese blocks is accessed by a single processor, n cal1 C'Rend or CCVrite is not rreqirired
each time 7'. C. i or t is accessed. Hotirever. the initial \\-rite reclrrest to r neecls to be nnodcled
since the block may not be within the cache of the processor recluesting the write. h f t e r the
initial write. the block d l remain in the cache in clle correct state ancl al1 accesses to it will not
irictir estra clelctys, Recall thnt tlie size of the cache is assumed to bc infinite, tliirs t k e block
will never be replwced. If each value in tlie matris is a four byte integer. 32 entries Et into a
single block of memory (recall thst a block is 128 bytes wicle). Assuming that the matmices are
aligned to block boirndaries. the romr and colrirnn coirnters can be iised to determine tHic block
containing a matrk entry. The blocks corresponding to rnatris entries in a. b aticl c c u b e storecl
in Block-4. BlockB and BlockC. respectively. A read or write request is sent to the appr ropriate
block as recluired (lines 7. 8 and 14). Note tliat a cd1 to YELD is macle whcn tirne reaches 10
(line 12)- The assirrnption is that time proportiorial to a tick bas elapsed wlien tinze reaches
a coiint of 10. Consicler what happens ,zs four fibers esecute the routirie in Figure 6G.3. If a
fiber exectites anj- read or kirrite reqiiest that h,as an ,associatecl delw esecution of the fiber is
suspended and anotlier fiber is run. Similarly, if a fiber has esecuted a tick worth of ope-rations.
it is suspended ,and ,mother fiber takes over the simulation system for a. tick. In this tnanner
tlie four fibers. each of which represents a processor in the multiprocessor system. execute on a.
1: for r - 16 - tzd to 16 - (tzd + 1)
2: for c-O to 63
3: t - O
3: for i -O to 613
.5: t - t + a[r][ i l - 6[i][r1
6: 47-1 [cl - t
Figure 6.2: Paralle1 matris multiplication.
uniprocessor systern-
6.4 The Parallel Alpha-Beta S imulator (PABSim)
In the followirig description of PABSim. calk to the first three siin~ilation support routines.
YIELD. CUFRITE and CREAD. are omitted for clcarity. These calls would never be present in any
p,arallel program - they are only present in PABSim for simiilation purposes. Fiirtlierrnore.
code that is iisecl to generate the artificial tree is also oniitted. Note tliat the artificiat tree
generation code in PABSitn is ideritical to the orle clescribed in Section 4.2. Tlie resrilting
ciescription cletails the DTS-like parallel aiplia-beta searcli procedure tliat forrns ttie core of
PABSim witlioiit tlie aclclecl clutter of simulation support and artificial tree generation code.
6.4.1 Node Structure
Tlie nocie striictiire is firridamental to al1 routines in PXBSim. Figirre 6.4 illustratcs the fielcls
nitliin the NodeT structure that is iisecl to holcl al1 node data. Tlie first fielcl. lock. is a lock
variable tliat controls access to the nocle. A node may be accessed by several proccssors at
the same time, therefore u lock is necessary t o ensure tliat orily one processor is allowecl to
rnodify the node at any t ime. The second fielcl. threads. is an intcger tliat inclicates the nimber
of threacls working ut the node. Recall that in DTS nocle ownership may transfer froni one
processor to another. A pointer to the parent node is rnaintained in parent so tbat the node's
current owner knows where to retiirn the score determined a t the node wlien it is closed. Wlen
returning the score cletermined at a node to its parent, certain chta striictirres have to be
cleared at the parent so that the node isn't visited agairi. The integer field, p ~ ~ e d i d ~ . contains
an index tliat specifies tlie position of the ctiild among ttie otlier children a t the parent. For
escample, if the current node is the fifth node expanded at the parent, parentidx would contain
4 (indesing starts a t O). A count of the moves that rem& unexaminecl s t a node is helcl in
tame - O
CWR[TE( t id . BZockT [ t i d ] )
for r - 16 - tid t o 16 - ( t i d + 1)
for c - O to 63
t - O
for i - O to 63
C R ~ . . \ ~ ( t l c l . BlocX=ii[(64 - r + i)/32]) C R ~ . - y ~ ( t l d . BlockB[(64 - i + r ) /33])
t - t + a[r][i] - b[i l[r]
10: tinte - tFme + 1
Il: if tinze = 10 then
12: Y r ~ ~ ~ ( t i d . 1)
13: t h e - O
14: C~%-RITE( t i d . BlockC[(G-l
15: c[rlkl - t
Figure 6.3: Simiilating matris multiplication-
1: record ~VodeT
'> - -. loch
3: threacls
-4: parent
paren f.i&
mvsleft
rnusfin i
t? /P
ulp ha
betn
SCoTe
bmnch [32]
child [f32]
Fig~ire 6.4: The Node T striictrirc.
mvsleft. Note that a move tliat is ciirrently k i n g seacheci by a processor is not includecl in
tliis count. A si~nilar count for rnoves that have been cornpletely searchecl is Iield in nrusjini.
PABSiai uses tlie minimils formulation as opposect to the negamas fornlulation. Recall that
in negarnas every riocle is of the masimizing type. In minimxi;. a clistinction is made between
a iiock tliat is of the rniriirriizirig type and a riocie tliat is of the mi.~simizirig typc. -4 flag that
specifies the nocle t j p e is storecl in type. Tlie lower ancl upper boirncl at a nocle are storecl in
alpha m c 1 betn, respecti\dy. -At a iniriiniizirig nocle. tlie lowest vdue fo~lrid so Far is storecl iri
score. The sanie field is used at a rnaxirnizirig node to store the liighest n z h e founcl- -4 List of
moves at a riode is storeci in the array: bmnch. AS branches are espancled. the riewly allocated
nocle chta structures are storeci in chiM. Note that the higliest brancliing factor at a node is
assimecl to be 32.
6.4.2 Node Allocation
Nodes are alIocated and dealIocated <as the searc11 of the tree progresses. The allocation ancl
deallocation routines use two heaps of nocles: a global heap m c I a Iocal heap. The global heap
is sharecl by al1 processors and it initially contains p - d - 2 nodes. Here, p is the number of
processors involved in the search ancl d is the depth of the tree being seûrched. Each processor
also l i i ~ a Iocal lieap of nocies. -4ltliough the local heaps are initially enipty: the local henps
return
Figure G.5: ,Ail possible s tate transitions for a processor perforniitig paralle1 alpha-beta search.
c~an contain up to ci - 2 nocies. ik7ien a processor allocates a tiocle. the local heap is checkeci
for a free riode before the global heap is rised, The global heap is sharecl by al1 processors ancl
access to it is controllecl bj- a lock. thus it is k t e r to satisfj- node allocation recluests 1ocalI~-
whenever possible. \l'lien a node is cleaIlocated. it is retiirried to the local heap if possible. IF
the local lieap hCas reacIicc1 its rnasimurn s i x . t h e node is returnecf to the global Lieap- At thc
start of the seardi, d l allocatio~ls tvill use the global heap. Mowever. as the search progresses.
d l noclc allocation requests will be satisfiecl Iocally.
6.4.3 Search Procedure
A processor that is participating in the parallel alpha-beta search proceclure clefi~iecl by P-4BSirn
m q * be in one of seven possible states: expan- search. eval. ret~rrn. split , update and enr;.
Figure 6.5 illustrates all possible state transitions that cari occur. Escept For the last state.
exit. there is a routine in PXBSirn correspoiiding t o each of the sis ot her states. These routines
are referrecl to as s ta te ro.t~tzne.5. Each processor also rnaintains a pointer to the -'ciirrenr;" nocle.
The target of al1 work performed by a processor is its c u r e n t nocle. A processor t n - be viewccl
as a state machine nrhose behavïor is governed bj- its s tate and its current riocle. ParaIlel searcti
is then a result of several interacting stute machines.
Figure 6.6 iIlustrates the T D a t a T chta s tructure that is riseci to liold search data local to a
processor. -4 pointer to che current nocle is stored in node and che processor's current state is
storecl in state. An additional fielcl called value is used LIS a ternporary chta storage area tvlien
a processor switdles states, A variable of type. T D a t a T , is passeci into each scate ro~itine.
State: expand
T h e routine corresponding to the expand state i s illustrateci in Figure 6.7. W l e n a node is
initially created. its move List is enipty. In the expancl state, a call is made t o a move generation
1: record TDataT
2: node
3 : s ta te
4: value
Figure 6.6: The TDataT data strticture.
1: B ~ x ~ ~ ~ r G ~ ~ ~ ~ ~ - i ~ ~ ( t d a t a . n o d e )
2: tdntastate - search
Figure 6.7: The ercpand state routine.
routine to crcate a valici move k t . Since PA4BSirn searclies artificial trees. the rnove generation
roiitirie that it uses creates a set of seecls that are to be passed on to the cliilcl nodes (see
Section 4.2). Once the niove generatiori is complete. a processor tliat is in the expunu! state
changes to the seurch state.
Slie noclc beiiig espanclecl is already lockect before the expand routine is execritecl. It is
lockecl wlien it is initially created by search state. When the stnte changes froni expand to
search. tlie node remains lockecl.
State: search
The semelt state routine is given in Figure 6.8. At the start of the routine. two checks are
performed to ensirre tliat the nocle is sec~rcliable. The first check verifies that the riocle score
is wïthin the boiinds at the node (lines 1-10) and the seconcl check verifies tliat there are
uriassig-ned nioves a t the node (lines 11-13). If eitlier of the two checks fail. a transition is
made to the return state. If both checks pcis. an unassigned move is expancled to create a new
node. A node created by NODEALLOC is initidly locked. This is to prevent otlier processors
from entering the newly created node before the nocle's rnove list is popirlatecl. Notc tliat when
the cal1 to NODEALLOC retiirns. the processor esecuting the search routine holds two locks:
one lock controls access to the cirrrent node and the other lock controls access to the newly
created node. Searcli now moves t o the new node: the lock corresponding to clle current node
is released and the newly createcl node becomes the current node. If the new node is a leaf. the
processor's state cliariges from search, to eual. Ho~vever~ if the new node is an interna1 node. the
state changes to expand.
State: eval
Tlie eval state routine is illustrated in Figure 6-9. In the evnl state, a score is determinecl for the
current nocle. Normally tliis woulcl involve the esecrition of an application specific routine that
compuces a score for tlie node. This routine is usuall- a significant percentage of the totaI search
time. Howec-er. since P-U3S irn searches art ificial t rees. t lie score is availal~le inst antarieotisly
In PABSim. EVALLTATE does not return the score irnmecliacely but c i e l i ' ~ ~ ~ the processor by a
specifiecl arnount of tirne beforc retrirning. Once e\rziluation is complete. the processor-s state is
changecl to return.
State: return
In the returrz state. the current nocle h a been cornpletely searcliecl and the score determinecl
at the nocle is ready to be retririicd to its parent. Tlie return state routine is preseiiced in
Figure 6. IO, R e d 1 thut in DTS. a node is on-nec1 by the l u t processor that is left searching the
node. The responsibifity of ret~~rning the nocle's score to its parent hlls on the nocle's olvner.
At tlie start of tlie return routirie. a check is performed to see if the current processor is the lzst
one at the nocle. If the current processor is not the last. the tlireacl coiint is sirnply ciecrernentecl
ancl the processor enters the spl i t state wliere it searclies for another nocle to work on. However.
if the crirrent processor is the Last one at tlie nocle. it begins the process of cleallocating the riocle
(lines 6-12). IF the node being cleallocatecl is tlie root nocle itself. theri the seurcli is cornplete
and the processor enters the erit state. In ridclition. a global Rag calleci searchend is set to
indicate tliat the search of the tree is complete. For any other node. the riocle's score is placed
witliin the processor-s value fielcl. Tlie ualue fielcl is iisecl ,as teniporary Gorage for a nocle's
score before it is nctu~illy mecl in the updnte state. The current nocfc is tiieri cleallocated. the
parent nocle becornes the ciirrent nocle ancl the processor's state is changecl to iipdate.
State: update
In the update state. the seardi has just returned from the searcli of a child nocle. The chilct
node's score is found in the processor's value field. Figure 6-11 shows the update state routine.
If tlie score returnecl by the chilcl is better thcm the score at the node, the score is irpdated.
M71ien an update occiirs, the new score is actually a bound for the nodes loiver in the tree. If
tlie new score is a lower bouricl. TRANSLB is used to transmit the new bound to the siibtree
rooted at the current nocle. TRXNSLB performs a depth first traversal on the subtree. At eacli
SEARCH( tciata)
1: if tclata-node-type = m a then
2: if tdata .node.score >_ tdata.node,betu then
3: tdata-node-score - tciatcz-node. beta
4: tdata-state - return
5: return
G: eIse
7: if tdata.noclescore 5 tdata.r~ode.alphct then
8 : tdnta-node .score - tdatn.node .aLpha
9 : tdata-state - return
10: re turn
II: if tdata. node.m.c?slefi = O t hen
12: tdata d a t e - retr~rn
13: return
14: k m p - N o ~ ~ A ~ ~ ~ c ( t c ~ a t a . n o d e )
15: tdata.node-muslefi - tclata.node.rnvsle - 1
16: LOCKLEAVE( tdnta .nocle.lock)
17: tclata.nocle - t e m p
13: if tdatn. node-clepth = O then
19: tdata-state - etru1
'20: else
21: tduta.state - e q a n d
Figpre (3.8: The search state routine.
EVAL( tdata)
1: tdnta-node .score - Ev. - \~u ' .~~~( tc la ta .node)
2: tdatastate - re tu rn
Figure 6.9: The eval state routine.
1: if tdata.node t h r e c . > 1 then
2: tdata .node .thrends - tdata-node ,thread.s - 1
3: L ~ ~ ~ L ~ . . \ ~ ~ ( t d a t a . n o d e .lock)
4: tdata-state - spl,it
5: else
if tclata.rtocle = root then
L ~ c r ; L ~ , ~ ~ ~ ~ ( t d a t u . r t o d e . l o c k )
tdntu-state - ezEt
senrcf~end - true
else
tdata. ~ralue - tdata-node ,sco~ti
tchta.r~ode - N o ~ ~ D ~ . - ~ ~ ~ ~ ~ ( t d a t a .node)
tda tas ta te - update
Figure 6.10: The retzirn state routine.
I : tdata-node . musfini - tdata. node .rnvsfini - 1
2: if tdata-node-Qjpe = max then
3: if tdata--ualue > tdata.nocle.score then
4: tdata.node.sco.r-e - tciata,:ualue
.5 : if tclata.node.th~reuds > 1 t hen
6 : T ~ ~ \ ~ ~ L B ( t c l a t a . n o d e ) - : else
8: if tdatu.t.alve < tdnta.nocle.score then
9: tdata .node score - tduta - u a h e
10: if tdatn.node.threacls > I t hen
11: T ~ x ~ ~ U B ( t d a t a . n o c l e )
12: tclata-state - search
Figure 6.11: The update state routine.
nocle, the new tower boirnd is compared to the esisting bound. If the new boiind is better. the
nocle-s boiincI is cllangecl. -4 sirnilx routine esists for transmit ting iipper borinds and it is callecl
T RAKSUB. Once the ciirrenr, node-s score is iipclatecl. the processor enters t lie search state.
State: split
The split state is a state in which iclle processors try to fincl a nocle wliere work is ctvailable.
Figure 6.12 illtistrcztes the split state routine. At the start of the roiitinc. the global t-ari-
cible. searchend is checked to determine if the tree searcli is complete, If search is complete.
tlie processor in tlie split state enters the exit state. However, if tlie searcli is not complete.
the processor tries to find a split-point where the searcli effort can be shared. The S P L I T P -
NTSEARCCI roiitine can be used to search through the entire tree to End a splic-point. Howcver,
SPLITPNTSEARCEI is too espensive to esecute each time R split-point is rieedecl. The previous
resulcs of esecuting S P L I T P N T S E X R C H can be foi~nd in splitpnt. When a split-point is neectecl.
the nock in splitpnt is cliecked for =ilidîty before S P L I T P N T S E A R C H is irivokecl. A d i c l spiit-
point is one that has some uriczssigned moves and one where the nock score is within the node
bouncls. Note that even after esecuting Splz'tPntSenrch, a valid split-point may not be found.
In this ccaset the processor delays for some time before retrying.
Recall that we are trying to compare the performance of three different split-point selection
1: for ever
2: if searchend = true t h e n
3: tciatn-state - exit
4: retrtrn - 3: LOCKENTE ~(spl i tp~ntIock:)
6: if S P L I T P N T ~ . - \ L I D ( ~ ~ ~ ~ ~ ~ R ~ ) = true then - l : splitpnt .threads - spl i tpnt .threads + 1
8: tdcih ,node - splitpnt
9:
10:
II :
12:
1.3:
14:
15:
16:
17:
18:
19:
20:
tdatn .state - search
LOCKL E A V E ( S ~ & ? Z ~ ~ O C ~ )
ret urn
splitpnt - S P L I T ~ ? N T S E A R C H ( )
if S ~ ~ r ~ P ~ ~ V : \ L r ~ ( s p l . i t p n t ) = true then
splitpnt .threads - splEtpnt .thrends + I tclata .node - splitpnt
tdata-state - senrch
L O C K L E . - ~ ~ ( s p l i t p n t l o c k )
return
L o c ~ L ~ ~ v ~ ( ~ p 1 i t p n t l a c k )
DELAY ( rechecktime )
Figure 6.12: Tlie split state routine.
scliernes. In Figure 6.12, S P L I T P N T S E A R C H is actually a Function pointer. The function that
is eseciitecl changes clepenclirig o n the split-point selection scherne being useci.
State Timing
Apart from the clelqs due to mernory access. lock entry and lock release: there are finite de lqs
associateci with the cornputation that takes place within each state. The search and r e t u m
state each consume a tick of processor tirne. The split state is one tick in duration if a wtlid
split-point is founc1 imrnediately. However. if a split-point search routine is started, ttn estra
tick of processor time is consurned for every node that is examinecl during the search. If the
searcli does not fincl a split-point, another SPLITPNTSEARCCI is not startecl untiI 36 ticks have
elapsed. The upclate state is one tick in duration if no score updates are performed- However.
if an update 1icw to be performed. ~ui estra tick is spent for eveq- child that hcas to be informed
of the score cl-iange. The expand and eual state are 3 and 8 ticks in duration. respectively.
6.5 Split-Point Selection Schemes
Tliree clifferent split-point selection schemes are corisiderecl. The first sclierne. -4. is baseci ori
the Y73Ci7C search procedure. Node classification is iclentical to tlie one describecl in Section 3.4.
In scfieme A. a nocle can be selectecl as a split-point 0n1- if it meets the follon-ing criteria:
0 -4t least one brandi at the nocle rnust hm-e been esaminecl.
If the node is of the Y-CUT type then at least .rn.-i branclles rnust 1ia.v-e been esaniined.
Here. m.=\ is a valrie that is varied during t h performance esperiments.
At iuiy point in tlie search. there will be several nodes in the tree tliat meet this criteria.
Three gpiclelines are used to determine which riode gets selectecl as the split-point. In orcler of
decreasing priority, the guiclelines are as folloms:
1. -4 nocle that is liigher rip in tlie tree is preferable as it represents rnore work.
2. A nocle chnt Ilas man)- branches alrcady esamineci is a good split-point since the score at
the node ma)- have stabilized. This gnicleline is not applicable once more than a cl~iarter
of the rnoves at a node have been esaminecl. For esample. on a tree n-itli a brancliing
Factor of 32. a node tliat lias 4 brandies esarriiiied is preferable to one tliat lias 3 braridies
esaminecl. Hoirever. a nocle that has 10 branches escimined is corisiclered eclrral1~- gooc1 as
a riode that has Il branches esamined.
3. A node tliat bas several irnesamined branches males a goocl split-point as a lot of paralle1
work is alrailable a t tlie nocle.
Sclieine B is bcued on tire DTS searcli proceclure- Nocles in the tree are classifiecl accorcling
to the scheme described in Section 3.5. In selecting a split-point: scheme B uses guiclelines 1
and 3 from scheme A. However, guideline 2 is replaced by a new guicleline that favors nodes
witli higher confidence. At a D-PV or D-ALL node, confidence is mecasured by tlie number of
moves examined. As the number of rnoves esamineci increases, the confidence that the node is
a D-PV or D-ALL node increases. The nurnber of moves esamined is aIso used at a D-CUT
node as si measure of confidence. However: at a D-CUT node. the confidence decreases as more
nioves are esczmined. Once more tlian m B moves have been esamined at a D-CUT node, the
node's t_vpe is changecl from D-CUT to D-ALL and the confidence that it is a D-ALL node
begins increasing. -4 node can be selected as a split-point only if it is a D-PV or a D--XL nocle
nith a confidence of at Iecast 1. Furtherrnore, in ci manner similar to guideline 2 in scheme A.
the confictence measure is not applicable if tlie nodes being compared have a conficlence t-aliie
that exceeds b1-L on a tree with a branching factor of b.
The tliircl sche~ne. C. uses the neural network clescribed in Section .5.4 to ciussiFy the nocles
in the tree. A nocie's type is determinecl using tlie proceclure describecl in Section 5.5. The range
usecl to determine if a node is of the N-CUS type is iclentical to that usecl in Section 5.5. In
aclclition to the valiie used for nocle classification. the neural network also provïcles a constraint.
W . on the nirmber of branches that must be searched secluentially before the nocle ccui be selecteci
as a split-point. For g-reater flesibility. two w.riables. rnc-, and TTLC-.,. are ~iseci to adjust the
constraint thac the neural network produces. At al1 N-ALL ancl 3-Pv nocles. ~ c - i n t ~ - , branches
have to be searched sequentidly before the nocle m l - becorne ci split-point. At a N-CUT nocle.
the constraint is increuecl even higher - LL?+ neaa+ rnc-, nocles have to bc searclied seqirentiailÿ
before parailel search is possible. IWien a few processors are usecl for the parailel sclirch. mc_,
and rnr-, are rrsuztllj- large: greater accuracy in split-point selection is obtained at the cost
of reduced parallelism- However. when a large number of processors are participating in the
searcli. rnC.,, ancl mC., are set to a small d u e : split-point selection accurncy is reclircecl biit
the pcircdlelism that is a[-ailable is increasecl. From the set of nocles t hat pass t liese const raints.
the node that is finnlly selected CU the split-point is based on tlic same set of gridelines as in
scheme A.
Performance of Split-Point Selection Schemes
The performCuice of tlie tllree split-point selection schenies. -4. B ancl C. riras comparecl usirig
the trees generated by set tg l . The results reported liere are totals for the 100 trees generated
by tlie set. Driring the performance esperinients. variablesl m,~ , m g . mc.tr ancl mc.,, w r e
varied to find the valiies at which the liighest speecl-iip wc?s obtained. The figures reportcc1 hcre
correspond to the higliest speecl-up obtained for each scheme.
6.6.1 Performance Measures
Before introclucing the rneasures used to compare the performance of the split-point selection
schemes, a few definitions are introduced. On a p processor systeml the cime taken to search
the 100 trees generated by t g L is denoted tp. On tlie same set of trees, the number of nodes
esamined by a p processor search is denoted n,.
Three mecmures are iised to compare perf~rm~mce. The first measure, speed-up, is the
traditional metric irsed to jridge the eficiency of a parallel program relative to a seqrientid one:
A parallel search ~isiiailj- examines a larger tree comparecl to a secluential search. The search
overhead associatecl mith a p processor searcti is given by:
n~ SO, = - - 1 ( 6 2 ) n1
Icleally. eacfi processor participating in a parallel search spencls al1 of its time searching tlie
tree. Howcver. tliere are several obstacles tliat prevcnt the realization of this icleal behavior.
Communication is necessary in PABSim to ensure tliat each processor Iias the Iatest bounds.
There are synchronization overheacls associntecl with lock entry and release. Furthcrmore.
the process of selecting a split-point For <21i idle processor aclcls extra overIieacls. TIiese three
overheacls arc groupeci into a single rneasurcment referred to as comrrzrrnicntion-.sy~~ci~ronizatior~-
split ( CSS) O\-erlieacl. On a sliarecl rnemory rnu1tiprocessor. coniniunicâtion ancl sj-nchoniza.tion
costs are usually srn;dl. hliich of the G'SS overhead will be cliie to ttic time spent searching for
a split-point. The G'SS overheacl can be approsirnatecl as the estra effort recpired to evdiiate
each node in a paralle1 search:
Note tliat the CS'S overliead should riot be irsed as a figure of rnerit n-itlioirt taking SU arid S O
into account. For csarnple. a paraIlel searcli t h spends rnost of its tirrie evaluating subtrees
tl-rat are usuidly cut-off mill liave a small G'SS overliead. However. tliis particiilar search n-il1
have a sniall speecf-up and a large search overheacl.
6.6.2 Uniprocessor Search
Table 6.3 presencs the results of a uniprocessor search on the trees generatecl by t g L. Tlicse
resiilts ad1 he usecl in evaluiiting SUp' SOp am1 CSS, as p takes on values in the set (12.24. AS}.
6.6.3 Trees With Ordering (32, 79, 5)
The perforrnttnce of schemes A. B and C under ordering (32.79.5) is presentecl in Table 6.4.
Figure 6.13 compares tlie performance of the three scliernes on the ba i s of spei-d-iip. seuch
overliead and G'SS overliead. Sclienies A and B perforrn similarly on al1 t h e memures, Wien
~ising 24 and 48 processors, Sclierne C's behavior is remarkably different from tbat eshibitecl
by schemes A and B. With 24 processors, sclieme C has a lower speed-up. Furthermore, it lias
a higher search overhead ;1~i well <as a higher CSS overhead. The higher searcli overhead can
Table 6.3: Performance of alpha-beta search on a single processor.
Ordering
32,ï9,5
be interpreted as a reduction in split-point selection accuracy and the higher C'SS overhead
can be interpreted as an increcase in the time spent performing communication and split-point
selection. I i i th 48 processors. scheme C hcas â much liigher speed-up and an even higher searcli
overhead. However, i ts CSS overhead is sipificantly Iower tliari the CSS overhead cdculatecl
for schemes A and B. Here. scheme C keeps ail processors biisy by reducing the split-point
selection accuracy Although there is increased inaccuracy in the split-point selection. cliere is
also an increase in the usefui work performed by the processors participating in the search.
6.6.4 Trees With Ordering (20.69,19)
t r (ticks)
220380112
Table 6.5 presents the performance of the three split-point selection sclie~nes under the ordering
(20.69.19). Figure 6.14 compares the speed-up. seczrch overlieacl ancl CSS overhead observecl for
the tliree schemes. Although schemes A and B perform sirnilady on the speed-up measurement:
there is a clifference in the way in which they achieve their individual speed-up values. Scheme
A speads less time selecting a split-point ancl searclies more nocles. whereas scheme B spends
more time in split-point selection and excanines fewer nodes. For both schemes. the speed-up
with 45 processors is less than the speed-up obtained with 24 processors. With 4s processors,
both schemes e-xhibit a kvge CSS overhead, As described in Section 6.6.1, CSS overheacl is
primarily cliie to the time spent searching for a split-point. Crsing a modifieci version of PABSim
that provides extra timing information. the split s ta te is found to account for over 60 percent of
the total search time using scheme A. Wien using scheme B: the split state is found to account
for over 70 percent of the total search time. The split state has effectively become the limiting
factor in the parallcl scarch for the folIowïng recasons:
nl (nodes)
22123250
More processors are actively seaching for a split-point.
There are fewer split-points as a result of the smdler tree size and the constraints imposed
by the schemes A and B.
Figure 6.13: Performance of schemes A, B and C under ordering (32,79: 5 ) .
1 Scheme -4
1 Scheme B
P
12
24
48
1 Scheme C
SUp 7.423
11.089
12.817
P
12
24
4s
rn,\
4
5
2
Table 6.4: Performance of schenles At B and C under orclering (32, 79.5).
t p (ticksj
29637828
19873555
17194575
np (nodes)
32457819
33626938
41103307
r n ~
6
5
1
p
12
With 12 and '24 processors. scheme C has similar speed-up values as schemes -4 ancl B. However.
when se~wdiing with 48 processors, scheme C performs sigificantly better than schemes A cznci
B. To achiex7e this speed-up, sclierne C tracles split-point selection accuracy in favor of keeping
its processors busy,
6.6.5 Trees With Ordering (32,66,66)
SOp 0.467
0.520
0.858
tp (ticks)
25601791
19356414
17302G47
rnc.,, mc.,
- l r 3
The performance of schemes A, B and C under ordering (32,66: 66) is presented in Table 6.6.
A cornparison of the speed-up, search overhead and CSS overhead eshibited by each scheme
is presented in Figure 6.15. The performance of schemes A =and B on d l three rneasures is
similcar. Scheme C outperforms both scheme A and scheme B on the speed-up rneasurernent.
When searching with 12 processors, scheme C acliieves a srnail search overhead at the cost of
a small increase in CSS overhead. With 24 processors, the search overhead is stiU quite small
but the increase in CSS overhead is much lczrger. When secarchhg with 48 processors, scheme
C reduces split-point selection accuracy to keep al1 its processors busy.
CSSp
0.102
0.424
1.016
SUp 7.705
11.385
12,811
tp (ticks)
29539208
np (nodes)
30927242
33797226
41280623
SUp 7.461
SOp
0.398
0.528
0.866
np (nocles)
31415747
CSSp
0.114
0.350
1.005
SOp
0.420
CSSp
0.133
I Scheme A
1 Scheme B
I Scheme C
P
13
24
48
Table 6.5: Performcmce of schemes A: B C ~ ~ n d e r ordering (20,69.
. rn~
4
2
1
p
12
24
48
Scheme B
t p (ticks)
QQ2S327
6751780
8695562
rnCsa7 mcTC
-1,3
-3:3
-4.3
Scheme C
P
12
24
48
S U p 6-75?
5.828
6.855
tp (ticlis)
9298153
6699466
5331472
Table 6.6: Performance of schemes A, B and C under ordering (32,66,66).
r n ~
5
3
2
p
13
np (nodes)
8'7108'14
9168454
10482967
Surp
6.410
5-89,
tp (ticks)
31281023
18527219
14879228
rnc,., mcVc
0,3
SOp O
0.519
0.731
np (nodes)
8803379
14298996
CSSp
0.231
0.789
3.031
11-150 , 17930200
SU, 6.473
10.928
13.607
tp (ticks)
SUp 0.459
1.3'70
CSSp
0.283
0.135
L.971
np (nocles)
35130206
35449416
29993951
S U p 1 np (nodes)
0.445
23623744 5.571 ( 244.129533 I
SOp 0.714
0.730
0.463
0.192 0.175 1
SOp
CSS,
0.082
0.270
1.410
CSS,
Figure 6.14: Performance of schemes A: B a d C tinder ordering (20,6c): 19).
Figure 6.15: Performance of schemes A. B and C under ordering (32 ,66 .66) .
1 Scheme A 1
1 Scheme B 1
P
12 SOp
0.766
Table 6.7: Performance of schemes At B <amci C uricler ordering (32,33.33).
CSSp
0.0'75
P
12
24
48
p
13
24
48
6.6.6 Trees With Ordering (32 .3333)
np (nodes)
72018499
m,i
6
SO, 0.778
0-905
0,903
Table 6.7 presents the performance of scliemes A, B ancl C uncier the ordering (3233.33)-
Figure 6.16 compares the performance of the three schernes on the basis of speeci-up. search
overliead and CSS over11e;~d. Once again. schemes A and B esliibit a similar perforrnünce
on a11 three measures. Sc l~eme C lias a higher speed-up relative to schemes A and B wlieri
searching with 12 ancl 48 processors. With 24 processors. al1 tliree scliemes e-dlibit a similar
speed-up. When searching with 12 and 24 processors. scheme C favors a liigher split-point
selection accuracy a t the cclst of a liiglier CSS overhead. When searching with 45 processors.
it favors a lower CSS overhead so tliat d l the processors ccvl be kept busy
rns 7
5
2
C S p 0.0.50
0-138
0.596
me.,, Tnc.~
0.1
-3, 1
-3 , l
6.6.7 Trees With Ordering (32,0,0)
t p ( t icks)
65197824
The performance of the schemes A, B cmd C under the ordering (32, O? 0) is presented in Ta-
ble 6.8. Figure 6.17 compares the performance of the three schemes on the b a i s of speed-up,
semch overhead and CSS overhead. All three schemes cxhibit similm speed-ups. Compared to
schemes A and B: there is a significant difference in the ~ v a y scheme C achieves its speed-up
results. With 12 a n d 24 processors, scheme C favors an increased split-point selection accuracy
at the cost of increased CSS overhead. With 48 processors, the scenario is reversed. Split-point
SUp 6.319
t p ( t icks)
55947868
37256277
23035327
np (nodes)
72502627
77679.512
t, (ticks)
64122450
37235149
SUP 6.425
11.06'7
S U p 1.363
11.058
17.853
26066335 15.805 77601935
np (nodes)
57645755
73518488
99485108
SO, 0,414
0.803
1.440
CSSp
0.153
0.204
0.100
Figure 6.16: Performance of schemes A, B and C under ordering (32: 33: 33).
1 Scheme A
-
Scheme B 1
1 Scheme C 1
CSS,
0.020
0-042
0.154 1
Table 6.8: Performance of schemes Al 5 ancl C under ordering (32,O: 0).
-
p
12
24
48
accuracj- is ailowecl to decrease in order to keep dl processors busy.
np (nodes)
291397171
349871306
372052246
P
12
24
48
6.6.8 The Significance of the Neural Network Approach
SOp 0,637
0,965
1.090
. r n ~
8
G
4
tp (tic&)
258220759
158327672
93253513
rnc,=. me.,
-5: O
-Y, O
-12,O
In the esperiments. the neural network approach. as embodied by sclierne Ct eshibits be1iavior
that is quite unlike the other two schemes, In particular, for non-random trees (an esponential
ordering wliere bot11 w, and u;h are non-zero), scheine C has a higher speed-up than sclzemes
A and B when a large niimber of processors are useci- Scheme C achieves this by rediicing the
split-point selection accuracy to keep al1 of its processors busy. In scherne A, when rn=i = 1: a
node may be selected =as û split-point if at l e s t one branch hcas been esaminecl. Since scheme
C achieves a higher degree of parcdelisml i t must have less stringent criteria for split-point
selection. Specifically, scheme C may select a node as a split-point even before a branch has
been fully examineci. When searching a tree with a lczrge wb using a few processors. scheme C
obtains a higher speed-up than schemes A B by increasing the split-point selection accuracy
at the cost of an increase in CSS overhead. The performance that the neural network approach
obtains over the otlier two schetnes is prim,arily due to its Aexibility. It is flexible enough to
increase parallelism when a large number of processors are being used and to increase split-point
selection accuracy when a few processors are being used.
SUp 7.186
11.720
19.899
tP (ticks)
246839068
152396535
89229804
SU,
7.517
12.176
20.796
np (nodes)
272691007
334553742
398466668
SOp 0.532
0.8'79
1.238
CSSP
0.042
0.049
0.031
Figure 6.17: Performance of schemes A, B and C under ordering (32.0: 0)
Chapter 7
Conclusions and Future Work
7.1 Conclusions
This work offers three main contributions. First. the concept of ,an exponentially ordered game
tree \vi'iS introduced as a moclel for the +une trees encountered in practice. Various statistics
arising froni the searcli of exponentially orderccl trees were compared to the statistics arising
h m the search of chess trees. With a certain set of parameters, exponentially orclerecl trees
were foirncl to produce trees that were quite similar to chess trees. Second. netiral networks were
introducecl as a method of solving the node clcassification problem. The netiral network approacli
ras found to outperform al1 other node classification techniqiies. The precliction accuracy of the
neural network approach increasecl as the tree ordering parcuneter. z ~ ' b . t ~ r ~ ~ increased- Thircl.
the neural network approach \vas irsed as a split-point selection scherne for parallel search. Here.
the neural network approach tva found to perform better than tecliniques btrtsed on YBtV-C ,mcl
DTS wlien a large number of processors ivas usecl. Fiirthermore. the neural network approach
performed well on a few processors if the tree ordering parameter w h FWS relatively large.
7.2 Fbture Work
There is a lot of room for esperimentation with the neurai network approacir. In this work,
we explored the performance of the neural network approach on exponentidly orclerecl game
trees. While exponentidly ordered trees rnimic most of the behavior of reul game trees, there
is no substitute for experimentation on real g,me trees. Chess has long served researcliers as
the main source of g m e trees and chess ccm be used once again as a testbed for the neural
network approach. The parCaIlel search resiilts described here are based on simulations. This
can be extended by performing the same experiments on a reai sshared memory rnultiprocessor
mith real game trees.
Appendix A
Seeds Used For Artificial Tree
Generat ion
The clrtificial tree generation methoct presented in Section 4-2 generates a tree btised on the
value of the seed pa rmete r . The esperiments conducted in this text use two sets of seecls: t g l
and tg-. The first set consists of 100 seed valiles while the second set consists of 200 seed values.
A S Set 1, tg,
A simple Linear congruential random number generator produces the seed values for t Iie first
set, t g L . The rûndom number generator is based o n the equation: r-, = ( T , - ~ x 16807) mod
Sl4748:3647. Note tiiat this is one of the two linear congruentiai generators used in Section 4.2.
To stàrt the computation of r,: 7-0 is initially set to 12345678. The valiies of 7., where 1 5 n 5 100 make up set tg Table -4.1 presents the seeds in the set.
A.2 Set 2, tg2
The second set, t g 2 , is generated using the digits of ir. A seed is formed using 9 secpential
digits of ri. For esample, the first seed in this set is 314159265. Tables A 2 and A.3 present the
seeds in this set.
Table A. 1: The seed vdiies in set t g ,
Table A.2: The 6rst hundred seed values in set t g2
Table A.3: The second hundred seecl vdues in set tg2
Bibliography
[l] Brockington, M G 7 (1996). A Taxonomy of Parallel Game-Tree Searching Algorithms.
[CC-4 .Jo.t~rnal. Vol. 19, No. :3. pp. 162 - 174.
[2] Culler, D-E.. Singh. J.P. with Gupta, A. (1999). Parallel Computer Architecture: A Hard-
ware/So_ftware Approach. Morgan Kaufmann.
[37 Donninger. C. (1993). Nul1 Move and Deep Search: Selective-Search Heuristics for Obtuse
Chess Prog~ams. iCC-4 .Jownal. Vol. 9. No. 1- pp. 3 - 19-
[4] Fos ter. 1. ( 1995). Desigwing and Programming Parallel Progra,rns. Addison-LVësley.
[5] Feldmann. R. (1993). Game Tree Search on Massivelg Parallel Sjstems. Ph-D. Thesis,
University of Paderborn, Paderborn, Gerrnany.
[6] Feldm,mn, R-. Moilien, B., Mysliwietz, P. and Vornberger, 0- (1989). Distribritecl Game
Tree Searcli. ICCA .Journal, Vol. 12. No. 2, pp. 65 - 73.
[7] Hq-kinl S. (1999). Neural Networks: A Cornprehensive Fowzdation. 2nd ed.. L,lacniil-
lan/IEEE Press.
[SI Hj.att, RAI. (198s). -4 High-Performance Parallel Algorithm to Search Depth-First G a m e
Trees. Ph.D- Thesis. University of Alabama. Birmingham.
[9] Hyatt, RM. (l!Xlï). The Dynamic Tree-Splitting Parallel Search Algorithm. ICCA Journc~l,
Vol. 20, No. 1. pp. 3 - 19.
[IO] Kaindl, H. (1988). Useful Statistics from Tournament Prograrns. ICC.4 Journal. Vol. 11,
No. 4. pp. 156 - 159.
[ll] Knutli. D. E. (1998). The A r t of Computer Programming, Vol. 2: Seminumerical Algo-
rithms. 3rd ed., Addison-Wesley, Ontario.
[12] Knuth, D.E. and Moore, R.W. (1915). An Analysis of Alpha-Beta Pruniug. Artificial In-
telligence, Vol. 6: No, 4, pp. 293 - 326.
[13] Kopec, D. and Bratko, 1. (1982). The Bratko-Kopec Experiment: A Cornparison of Human
and Cornputer Performance in Chess. ln M.R.B. Clarke, editor. ddvances in Co,mputer
Chess 3, pp. 57 - 72. Perniagon Press, OxFord.
[14] Nanohürarajah, V. (199,). Rajah: The design of a chess program. ICC-4 Jownal. Vol. 20.
No. 3, pp. 87 - 91.
[15] hI:uslcmcll T.A. (1986). A Review of Game-Tree Pruning ICCA Journal. Vol. Q, No. 1, pp.
3 - 19.
[lô] h~1,usland. T.,Ji. and Campbell, M.S. (1952). Paralle1 Searcli of Strongly Orclerecl Game
Trees. -4CM Computing Sumeys, Vol. 14, No. 4. pp. 533 - -5.51.
[II] i\;l,uslcmd? TA. <and Popowich, F. (1985). Paralle1 Gcuno,Tree Search. IEEE Transactions
on Pattern Analysis and Machine Intelligence. Vol. PAhlZI-7, No. 4, pp. 442 - 452.
LIS] h:Z.l.rsland~ TA., Reinefeld, A. and Schaeffer, J. (1957). Low Overheacl Alternatives to
SSS*. -4rtificial Intelligence. Vol. 31, No- 2. pp. 185 - 199.
[19] Microsoft Platform SDK Documentation.
[20] Reinefeld, A. (1983). An Improvement of the Scout Tree-Search Algorithm. ICC.4 .Jo~urnal,
Vol. 6, No. 4, pp. 4 - 14.
[21] Schaeffer. J . (1989). Distributeci Game-Tree Searcliing. Journal of Parallel and Distrihuted
Computing, Vol. 6. N o . 3, pp. 90 - 114.
p2] Shanrion. C. E. (1950). Progrstmmiiig a Computer for Playing Chess. Philosophical Maga-
zine, Vol, 41: No. 7, pp. 256 - 275.
[23] Slate, D.J. (1987). A Chess Program that uses its TrCmsposition Table to Learn from
Experience. ICCA Journal, Vol. 10, No. 2. pp. 59 - 71.