chess, chance and conspiracy.pdf

11
arXiv:0708.3961v1 [stat.ME] 29 Aug 2007 Statistical Science 2007, Vol. 22, No. 1, 98–108 DOI: 10.1214/088342306000000574 c Institute of Mathematical Statistics, 2007 Chess, Chance and Conspiracy Mark R. Segal Abstract. Chess and chance are seemingly strange bedfellows. Luck and/or randomness have no apparent role in move selection when the game is played at the highest levels. However, when competition is at the ultimate level, that of the World Chess Championship (WCC), chess and conspiracy are not strange bedfellows, there being a long and colorful history of accusations levied between participants. One such accusation, frequently repeated, was that all the games in the 1985 WCC (Karpov vs Kasparov) were fixed and prearranged move by move. That this claim was advanced by a former World Champion, Bobby Fischer, argues that it ought be investigated. That the only published, concrete basis for this claim consists of an observed run of particular moves, allows this investigation to be performed using probabilistic and statistical methods. In particular, we employ imbedded finite Markov chains to evaluate run statistic distributions. Further, we demonstrate how both chess computers and game data bases can be brought to bear on the problem. Key words and phrases: Chess, data bases, distribution theory, Markov chains, run statistics, streaks. Chess is a game of luck. If you have a good opponent you have bad luck, and if you have a bad opponent you have good luck (Jacob Segal, age 7). 1. INTRODUCTION Chess and chance are seemingly strange bedfel- lows. Many would contend that chess is the most logical/rational of all games, although such claims often get a rise from Go players. In any event, luck and/or randomness have no role in move selection when the game is played at the highest levels. How- ever, when competition is at the ultimate level, that Mark R. Segal is Professor, Department of Epidemiology and Biostatistics, and Director, Center for Bioinformatics and Molecular Biostatistics, University of California, San Francisco, California 94143-0560, USA e-mail: [email protected]. This is an electronic reprint of the original article published by the Institute of Mathematical Statistics in Statistical Science, 2007, Vol. 22, No. 1, 98–108. This reprint differs from the original in pagination and typographic detail. of the World Chess Championship (WCC), chess and conspiracy are not strange bedfellows. There is a long and colorful history of accusations, levied between participants, instances of which are show- cased below. One such accusation, frequently re- peated, was that all the games in the 1985 WCC were fixed and prearranged move by move. That this claim was advanced by a former World Cham- pion, Robert (Bobby) Fischer, argues that it be in- vestigated. Now, the logic that Fischer’s chess play- ing credentials warrant such an investigation has been questioned (an Editor, Mark van der Laan, per- sonal communications) in view of widespread per- ceptions concerning his diminished faculties (e.g., Krauthammer, 2005). However, it is worth noting that there are adherents to his allegations among the chess-playing community, including one former World Champion (Spassky, 1999) and one former women’s World Champion (Z. Polg´ ar in Polg´ ar and Shutzman, 1997). Since the only published, concrete basis for Fischer’s claim consists of an observed run of particular moves, we pursue such an investigation using probabilistic and statistical methods. The paper is organized as follows. In the next sub- section we provide a very brief history of the World 1

Upload: olivia-wade

Post on 28-Jan-2016

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chess, Chance and Conspiracy.pdf

arX

iv:0

708.

3961

v1 [

stat

.ME

] 2

9 A

ug 2

007

Statistical Science

2007, Vol. 22, No. 1, 98–108DOI: 10.1214/088342306000000574c© Institute of Mathematical Statistics, 2007

Chess, Chance and ConspiracyMark R. Segal

Abstract. Chess and chance are seemingly strange bedfellows. Luckand/or randomness have no apparent role in move selection when thegame is played at the highest levels. However, when competition isat the ultimate level, that of the World Chess Championship (WCC),chess and conspiracy are not strange bedfellows, there being a longand colorful history of accusations levied between participants. Onesuch accusation, frequently repeated, was that all the games in the 1985WCC (Karpov vs Kasparov) were fixed and prearranged move by move.That this claim was advanced by a former World Champion, BobbyFischer, argues that it ought be investigated. That the only published,concrete basis for this claim consists of an observed run of particularmoves, allows this investigation to be performed using probabilistic andstatistical methods. In particular, we employ imbedded finite Markovchains to evaluate run statistic distributions. Further, we demonstratehow both chess computers and game data bases can be brought to bearon the problem.

Key words and phrases: Chess, data bases, distribution theory, Markovchains, run statistics, streaks.

Chess is a game of luck. If you have agood opponent you have bad luck, and ifyou have a bad opponent you have goodluck (Jacob Segal, age 7).

1. INTRODUCTION

Chess and chance are seemingly strange bedfel-lows. Many would contend that chess is the mostlogical/rational of all games, although such claimsoften get a rise from Go players. In any event, luckand/or randomness have no role in move selectionwhen the game is played at the highest levels. How-ever, when competition is at the ultimate level, that

Mark R. Segal is Professor, Department of

Epidemiology and Biostatistics, and Director, Center

for Bioinformatics and Molecular Biostatistics,

University of California, San Francisco, California

94143-0560, USA e-mail: [email protected].

This is an electronic reprint of the original articlepublished by the Institute of Mathematical Statistics inStatistical Science, 2007, Vol. 22, No. 1, 98–108. Thisreprint differs from the original in pagination andtypographic detail.

of the World Chess Championship (WCC), chessand conspiracy are not strange bedfellows. Thereis a long and colorful history of accusations, leviedbetween participants, instances of which are show-cased below. One such accusation, frequently re-peated, was that all the games in the 1985 WCCwere fixed and prearranged move by move. Thatthis claim was advanced by a former World Cham-pion, Robert (Bobby) Fischer, argues that it be in-vestigated. Now, the logic that Fischer’s chess play-ing credentials warrant such an investigation hasbeen questioned (an Editor, Mark van der Laan, per-sonal communications) in view of widespread per-ceptions concerning his diminished faculties (e.g.,Krauthammer, 2005). However, it is worth notingthat there are adherents to his allegations amongthe chess-playing community, including one formerWorld Champion (Spassky, 1999) and one formerwomen’s World Champion (Z. Polgar in Polgar andShutzman, 1997). Since the only published, concretebasis for Fischer’s claim consists of an observed runof particular moves, we pursue such an investigationusing probabilistic and statistical methods.

The paper is organized as follows. In the next sub-section we provide a very brief history of the World

1

Page 2: Chess, Chance and Conspiracy.pdf

2 M. R. SEGAL

Chess Championship post World War II. This pro-vides context to the 1985 match. Conspiracy ele-ments surrounding the 1978 match are highlightedto indicate the climate in which these contests areconducted. Section 2 focuses on Fischer’s claim re-garding move runs and several approaches for evalu-ating the significance thereof. In particular, we em-ploy imbeddings in finite Markov chains that are es-pecially suited for this purpose. Section 3 describesfurther analytic possibilities for run assessment thatare afforded by use of powerful, chess-playing soft-ware, while Section 4 does likewise with respect togame data bases. Concluding discussion is presentedin Section 5.

For the most part, only a very rudimentary under-standing of chess is required for this paper. Ratherthan give definitions of pieces, legal moves and otherrules, we defer such to the many expository treat-ments available both in print and online.

1.1 World Chess Championship: History and

Controversy

Prior to 1948, the World Chess Championship wasarranged at the behest of the reigning World Cham-pion. This formulation allowed the Champion to se-lect substandard opponents and avoid leading ad-versaries. However, the confluence of the death ofthe titleholder, World War II and the emergence ofan international governing body, Federation Inter-nationale des Echecs (FIDE), led to the abandon-ment of such matches by invitation. In 1948, FIDEinvited six leading players to participate in a round-robin tournament for the title of World Champion.Subsequently, up until 1990, the WCC was orga-nized to run on a three-year cycle. The cycle startedwith the world’s best chess players being seeded intoone or more interzonal tournaments. The top fin-ishers in the interzonal tournaments qualified for aseries of elimination (“candidates”) matches. Theplayer who emerged victorious from the candidatesmatches met the reigning World Champion in a titlematch.

This period, which notably coincided with the ColdWar, was marked by the dominance of Soviet play-ers. From 1948 up to 1972 there were nine WCCs,each featuring a Soviet champion and challenger.However, in 1972, following a stunning surge throughthe interzonal tournament and candidates matchesthat included an unprecedented run of 19 consec-utive wins, an American, Robert (Bobby) Fischer,emerged as the challenger to titleholder Boris Spassky.

Fischer won convincingly, but forfeited the title toAnatoly Karpov in 1975.

Karpov’s challenger in the 1978 WCC was Vic-tor Korchnoi. Previously, in 1976, Korchnoi had de-fected from the Soviet Union and sought politicalasylum in the Netherlands. The USSR Chess Fed-eration put pressure on FIDE to exclude Korchnoifrom the candidates matches. FIDE did not buckleand, as fate would have it, Korchnoi defeated threeSoviet players on his way to the title match withKarpov. Shortly before the final candidates match,Korchnoi was injured in a serious car accident. Then,during the WCC itself, a remarkable series of claimsand counterclaims was exchanged, that showcasesthe tensions and controversies surrounding chess atthis level. These included (i) Karpov demanding thedismantling of Korchnoi’s chair to search for “extra-neous objects or prohibited devices,” (ii) Korchnoiwearing mirrored sunglasses to neutralize Karpov’shabit of staring at his opponent, (iii) Korchnoi ac-cusing Karpov of receiving move advice encoded bythe color of the yogurts that he was given during thegame, (iv) Karpov employing a parapsychologist,Dr. Zukhar, who sat in the audience, fixedly staringat Korchnoi, purportedly to disturb/hypnotize—thisled to intense bickering with regard to Zukhar’s seat-ing position and Korchnoi placing his own hypnotistand two Ananda Marga sect members in the audi-ence.

Ultimately, Karpov (barely) prevailed. He deci-sively beat Korchnoi in the subsequent 1981 WCC.Karpov’s next challenger for the 1984 WCC wasGary Kasparov, a mere 21-year-old, who had domi-nated his three candidates matches en route to qual-ifying. Rules for the 1984 WCC were that the win-ner would be the first player to achieve six victo-ries, there being no limit on the total number ofgames played. Karpov started convincingly, leading4–0 after 9 games and 5–0 after 27. However, Kas-parov won the 32nd game and, following anotherlong string of draws, won both the 47th and 48thgames. While Karpov still led 5–3, he had lost 10 kg(22 lbs), been hospitalized several times and was onthe verge of collapse, his condition compounded byalleged frequent use of stimulants. Amid great con-troversy, the president of FIDE canceled the match,citing the failing health of the players due to therecord breaking length and duration (6 months) ofthe contest. Provisions were made for a rematch tobe held in 6 months time, with a fixed number (24)of games, Karpov retaining his title in the event of a

Page 3: Chess, Chance and Conspiracy.pdf

CHESS, CHANCE AND CONSPIRACY 3

Fig. 1. Karpov vs Kasparov key position, prior to White’s21st move: 21. N × e6.

tie. Kasparov won the 1985 rematch, becoming theyoungest ever World Champion. He successfully, al-beit narrowly, defended his title in three subsequent(1986, 1987, 1990) WCC matches against Karpov.It is this body of Karpov vs Kasparov games, andaccusations surrounding them, that is the subject ofour further analysis.

2. KARPOV–KASPAROV 1985: MOVE RUNS

Fischer has frequently claimed that all games ofthe 1985 WCC match between Karpov and Kas-parov were rigged and prearranged move by move(Fischer, 1996; Polgar and Shutzman, 1997). No-tably, at his first press conference following his re-cent release to Iceland from immigration detentionin Japan, Fischer repeated this allegation (ESPNSportscenter, March 25, 2005). One element of thisaccusation, indeed the only concrete piece of evi-dence proffered, concerns the fourth game and theposition following Black’s 20th move as depicted inFigure 1.

Fischer (1996) asserts:

Starting on move 21, White makes no lessthan 18 consecutive moves on the lightsquares. Incredible!

Kasparov and Karpov played a total of 144 gamesthat totaled 5540 moves over the course of their

5 WCCs—we will investigate this incredibility interms of runs.

Consider n i.i.d. Bernoulli trials with success prob-ability p. Let Nn,k be the number of nonoverlappingsuccess runs of length k and let Ln be the lengthof the longest success run. Although our primaryinterest is in probabilities such as Pr(Ln > k), wewill sometimes obtain these by exploiting the equiv-alence Ln < k ⇔ Nn,k = 0. The exact probabil-ity distributions for these events have been histor-ically difficult to compute. Godbole (1990a) estab-lished the formula

Pr(Nn,k = x)

=∑

⌊(n−kx)/k⌋≤y≤n−kx

qypn−y(

y + xx

)

(1)

·∑

0≤j≤⌊(n−kx−y)/k⌋

(−1)j(

y + 1j

)(

n− kx− jky

)

.

While (1) offers computational advantages when con-trasted with previously established combinatoriallyderived alternatives (see, e.g., Philippou and Makri,1986), it is nonetheless problematic for evaluatingthe probability of observing a run of length k = 18 ina series of length n = 5540. However, Feller (1968),exploiting the theory of recurrent events, providesan approximation that is highly accurate even formoderate n,

Pr(Ln < k) ≈1− pθ

k + 1− kθ·

1

θn+1,(2)

where θ solves θ = 1 + qpkθk+1 and q = 1 − p. Ap-plying (2) to the Karpov–Kasparov collection gives

Pr(L5540 ≥ 18) ≈ 0.0105.

Now, this p-value is not incredibly incredible, espe-cially since it does not accommodate Fischer’s im-plicit search for other (run) patterns—presumably arun of moves to dark squares would also have regis-tered. On the other hand, it allows runs to straddledifferent games and this does not reflect Fischer’slikely ascertainment process. Rather, runs would bedetected within games and, accordingly, individualgames should constitute the units of analysis.

So, focusing on the key game, we are interested inPr(L63 ≥ 18), which can be evaluated using either(1) or (2). These give (with agreement to six decimalplaces)

Pr(L63 ≥ 18) = 0.0000896.(3)

Page 4: Chess, Chance and Conspiracy.pdf

4 M. R. SEGAL

If this p-value is adjusted for multiplicity (numberof games = 144) via Bonferroni correction, we ob-tain an (adjusted) p-value (0.013) that again doesnot impress as being overly incredible. However, notonly is such Bonferroni correction conservative, butit also does not reflect the length structure of thegame collection. At the cost of obtaining p-valuesfor the maximal run within each game, we couldseek to redress these concerns using, say, step-downBonferroni (Dudoit, Shaffer and Boldrick, 2003) orfalse discovery rates (Storey, Taylor and Siegmund,2004). But there is a far more fundamental con-cern that affects the computation of these individ-ual p-values. The results given in (1) and (2) re-quire that the underlying sequence of Bernoulli tri-als is i.i.d. Here the corresponding ith random vari-able can be designated as Xi ≡ IWhite’s movei →light square and identically distributed equates topi = Pr(Xi = 1) = p∀ i. All the above p-value cal-culations have been based on the seemingly naturalspecification pi = 0.5. We next further scrutinize thisspecification.

The prescription pi = 0.5 derives from the factthat 32 of the 64 squares on the chessboard are light.Indeed, the symmetries and piece and pawn movesare such that, prior to White’s first move, there areequal numbers of moves to light and dark squares.However, that situation does not necessarily persist.For example, following Fischer’s generally preferredfirst move, 1. e4, we have p2 ≥ 2/3. In studying hit-ting streaks in baseball, Albright (1993) employedso-called situational covariates to capture variationin underlying probabilities of getting a hit for a givenat bat. These included such features as home fieldand day/night game. We could attempt an analo-gous approach here. Relevant covariates would in-clude presence of opposite-colored bishops, presenceof knights, nature of the pawn structure and evenadvent of a time control, the latter often inducingrepetitions. However, selecting, characterizing andquantifying such positional covariates seems prob-lematic, especially considering the availability of asimple proxy.

The proxy is based on the set of legal moves. Afterall, 18 consecutive light square moves in checkers

Fig. 2. Probabilities of White making a legal move to a light square. The 18 move run is indicated by the dashed vertical

lines.

Page 5: Chess, Chance and Conspiracy.pdf

CHESS, CHANCE AND CONSPIRACY 5

would be truly incredible, since checkers is playedexclusively on dark squares. Define

pi =#legal movesi → light square

#legal movesi.(4)

Results that follow are not affected by how possibleambiguities surrounding castling are handled. Here,we exclude castling throughout. At the onset of therun we have p21 = 33/43. Using this value in either(1) or (2) gives

Pr(L63 ≥ 18|p = p21) = 0.096.

While again this is not remarkable, even prior tomultiplicity adjustment, we have imposed p21

throughout the game. Inspection of Figure 2, whichplots pi vs i, reveals this to be an extreme choice.What is needed is a method for evaluating runs withvarying success probabilities.

Fu and Koutras (1994) provided just such amethodology. They showed that the distributions ofa variety of run statistics can be readily evaluatedin the nonidentically distributed case by appropri-ate imbedding in a finite Markov chain. We brieflyrecapitulate the relevant definitions and results. Forgiven n, let Γn = 0,1, . . . , n be an index set andlet Ω = a1, . . . , am be a finite state space. A non-negative, integer-valued random variable Zn can beimbedded into a finite Markov chain if:

(a) there exists a finite Markov chain Yi : i ∈Γn = 0, . . . , n defined on Ω,

(b) there exists a finite partition Cx, x = 0,1,. . . , l on Ω, and

(c) for every x = 0,1, . . . , l, we have Pr(Zn = x) =Pr(Yn ∈ Cx).

If Zn can be imbedded into a finite Markov chainthen, simply as a consequence of the definition andChapman–Kolmogorov equations, we have (Theo-rem 2.1, Fu and Koutras, 1994)

Pr(Zn = x) = π0

(

n∏

i=1

Λi

)

U ′(Cx),(5)

where Λi is the m×m transition probability matrixof the Markov chain, π0 is the initial distributionand U(Cx) =

r : ar∈Cx

er for er a 1×m unit vectorhaving 1 at the rth coordinate and 0 elsewhere.

The art in utilizing imbedded Markov chains forevaluating run statistic distributions lies in elicitingsuitable Ω and Λi. For our run statistic of interest,

Nn,k (recall equivalence to Ln), Fu and Koutras es-

tablished Pr(Nn,k = 0) = π∗0(∏n

i=1 Λ∗i )U

∗′ , where

Λ∗i

(k+1)×(k+1)

=

qi pi 0 0 . . . 0qi 0 pi 0 . . . 0qi 0 0 pi . . . 0...

......

.... . .

...qi 0 0 0 . . . pi

0 0 0 0 . . . 1

and π∗0 = (1,0, . . . ,0) and U∗ = (1, . . . ,1,0) are 1 ×

(k +1) vectors. For the key Karpov–Kasparov gameuse of this imbedding, with pi as per (4) and Fig-ure 2, gives

Pr(L63 ≥ 18|pi) = 0.0065.(6)

The densities (smoothed histograms) for L63 underpi and pi = p = 0.5 are presented in Figure 3. Theshift in mass corresponding to (appropriate) use oflegal moves is evident and underscores the differencebetween the p-values (3) and (6). We note that com-putation via imbedded Markov chains is exceedinglyfast and easy.

Since (6) is our final p-value proffered for Karpovand Kasparov’s 18-move run of light square moves,we provide some context with regard to magnitudeby citing Short and Wasserman’s (1989) appraisal ofthe p-value they computed in assessing the “streak-of-streaks”—Joe DiMaggio’s 56-game hitting streak.They reported one p-value of 0.0000486 and com-ment that “the paucity of this probability is notoverwhelming.” Accordingly, we would not charac-terize the p-value in (6) as incredible.

3. CHESS PROGRAMS: FRITZ

In baseball, the batters objective is to hit, the 1919Chicago Black Sox excepted. In basketball, wherestreak shooting has also been statistically evaluated(Larkey, Smith and Kadane, 1989; Tversky andGilovich, 1989a, b), the objective is to make thebucket, with 1978–1979 Boston College (among oth-ers) excepted. In chess, if the objective was to moveto light squares, there would be a lot more sacrificeof dark square bishops. So, to infer anything about“signal” or “suspicion” in the moves that constitutethe run singled out by Fischer, it is necessary tojudge the quality of these moves. If, conditioning onsuccessive positions within the streak, the best movehappens to be to a light square, then what is themessage/implication? Is it incredible if the WorldChampion makes 18 good moves in a row? Was the

Page 6: Chess, Chance and Conspiracy.pdf

6 M. R. SEGAL

quality of moves played during the run notably dif-ferent than the quality of moves chosen outside therun? Of course, judging quality is tricky—Fischer,Karpov and Kasparov are all former World Cham-pions and regarded among the greatest players of alltime—it would be presumptuous for mere mortals tosecond guess their evaluations.

So, to make such judgments we turn to chess-playing software. Select programs are now capableof playing competitively against the world’s leadinggrandmasters. For example, in October 2002, WorldChampion Vladimir Kramnik drew an eight-gamematch against the Fritz program; in early 2003, Kas-parov drew a six-game match against the Deep Ju-nior program; and in November 2003, Kasparov drewa four-game match against Fritz. Accordingly, weuse Fritz to evaluate move quality. While we contendthat these results, along with wider tournament per-formances and ratings, establish Fritz’s credentialsto determine good moves, it is important to recog-nize that we are not claiming infallibility or even su-periority of Fritz’s move assessments. Rather, since

comparisons will be relative—in run (R)/not in run(R)—all that is required is that strong moves areconsistently recognized and that the assessments areunbiased with respect to the move run.

Fritz returns a quantitative score for each moveevaluated. Here, the more positive the score, thebetter the move. We cannot just compare R andR scores directly, since differing score distributionsreflect the decisiveness of the move and so, for exam-ple, are correlated with stage of the game. Thus, forour first outcome, we standardize according to thebest move possible and define ∆i = scorei−bestscorei,where scorei is Fritz’s score for the ith move asplayed and bestscorei is the score for the best pos-sible move. So, ∆i ≤ 0 with less negative values cor-responding to better moves. Comparing the ∆s ob-tained during the run (nR = 18) to those not in therun (nR = 45) via a two-sample t-test shows thatbetter moves tended to be played during the run,but the difference is nonsignificant (p = 0.27). Thesame findings hold if we limit nonrun moves to thenine preceding and nine succeeding the run (to exert

Fig. 3. Smoothed histograms of Pr(L63 = k). Tail areas to the right of the dashed line at k = 18 correspond to the p-value

reported in (3) and (6).

Page 7: Chess, Chance and Conspiracy.pdf

CHESS, CHANCE AND CONSPIRACY 7

finer control over stage of game), and/or if we workon a relative scale (∆∗

i = ∆i/bestscorei), and/or ifwe work with ranks instead of scores.

A more sophisticated approach is afforded by em-ploying the binary outcome that indicates whetherthe move played coincided with Fritz’s best move,Yi = I∆i = 0, and using logistic regression to at-tempt to adjust for putative covariates. This alsoremedies a concern with t-test appropriateness inview of the underlying mixed (discrete with mass atzero, continuous) distribution of ∆. As previously,covariate specification is challenging: what we pro-pose here is natural but not exhaustive. Playing thebest move when it is obvious is less incredible thanwhen it is obscure. Accordingly, we adjust for po-sitional complexity which, for each move, we op-erationalize as complexity(j, k) = score(jth best)/score(kth best) for differing choices of j > k = 1, . . . ,5.Similarly, playing the best move when the number ofpossibilities is limited is also less compelling, so weuse possibles = #legal moves as another covari-ate. We fitted several models of the form logit(Y ) =βrun+ f(possibles, complexity(j, k)), correspondingto different specifications for f, j and k. All gavehighly null results with respect to tests for β, theparameter of interest.

Therefore, when considering the quality of movesplayed, there is nothing distinctive, let alone incredi-ble, about the run Fischer isolates. However, in addi-tion to relying on Fritz’s assessments of move qual-ity, these analyses could be criticized on the basisof being underpowered to detect distinguishing at-tributes of the run. So, as a final approach, we turnto repositories of chess games to provide broadercontext for the run.

4. SEARCHING CHESS GAME DATA BASES

Efron (1971) reexamined Bode’s law “governing”mean planetary distances to the Sun as an exem-plar of whether an observed sequence of numbersfollows a simple rule. He states that differing an-alytic approaches would have been employed hadmeasurements on 50 solar systems been available. Sofar, our treatment of the significance of the run hasbeen confined to the key Karpov–Kasparov game inquestion. However, unlike the planetary system situ-ation, there are extensive data bases of chess gamesthat we can mine so as to appraise the distinctive-ness of the run identified by Fischer. To this end weconducted three structured searches.

The first was based on the key position itself (1). Ifthis position had occurred in other high level games,then the ensuing sequence of moves would be poten-tially informative with regard to the incredulity ofthe run. Not surprisingly, search of an online database that comprises more than two million games(www.chesslab.com/) revealed the position to beunique.

The next search is motivated by an important fea-ture of the key position: the presence of opposite-colored bishops. This is arguably the most impor-tant attribute with respect to generating move runson either light or dark squares. We used the com-mercial Chessbase (www.chessbase.com/) Big 2000data base which, although not as sizable as Chesslab(1,327,059 games), possesses more powerful searchtools. The following search parameters were prescribed—for a game to be selected it must satisfy all require-ments at some stage:

1. Opposite-colored bishops2. No knights or major pieces3. Three to four pawns4. Average Elo rating of players ≥ 25005. Length of game ≥ 80 moves

In addition to limiting the number of games chosen,these criteria are motivated by the following con-siderations. Item 2 eliminates pieces that can moveon either color squares; item 3 balances complex-ity; item 4 imposes grandmaster level play. Item 5,which mandates relatively long games, is motivatedby run statistic asymptotics. In particular, if n,k →∞ such that nqpk → λ, then Pr(Nn,k = x) tends tothe Poisson probability e−λλx/x! (Feller, 1968; God-bole, 1990b). So if p is fixed, k = O(log(n)), therebybestowing a premium on examining longer games.

The resulting search of Chessbase Big 2000 yielded11 games that had, in chronological order, the fol-lowing maximal run lengths: L85 = 21,L109 = 29,L81 = 46,L90 = 10,L92 = 13,L85 = 18, L81 = 17,L86 = 14,L98 = 20,L113 = 9,L94 = 12. Fischer’s iden-tified Karpov–Kasparov run, with L63 = 18 does notstand out against this collection. Indeed, if anythingis notable, it is the third game with L81 = 46: notonly was this game played at the candidates level(Timman vs Salov, 1988), but, in addition to thecited 46-move run (played by Black) that attainsan appreciably smaller p-value than the Karpov–Kasparov run (2.24 × 10−13), it featured a separate34-move run (played by White).

Page 8: Chess, Chance and Conspiracy.pdf

8 M. R. SEGAL

Fig. 5. Probabilities of White making a legal move to a light square. The 13-move run is indicated by the dashed vertical

lines.

For the third and final search we elected to scru-tinize the 827 Chessbase Big 2000 games of Fischerhimself. Search item 1 was retained, the restrictionsimposed by items 2, 3 and 4 were eliminated, anditem 5 was relaxed to game lengths ≥ 50 moves.This yielded a total of five games. We focus on oneof these, Fischer vs Reshevsky, 1957, United StatesChampionship. In particular, the position arisingprior to White’s 34th move is of interest; see Fig-ure 4. From this position Fischer (White) played 13consecutive moves to dark squares, this in the con-text of a 57-move game with move-by-move prob-abilities of moving to a light square as depicted inFigure 5.

We follow the same approach in using imbeddedMarkov chains to evaluate the significance of thisrun, using pi in accordance with (4) and Figure 5,as was employed in Section 2. This gives

Pr(L57 ≥ 13|pi) = 0.0023.(7)

The punch line is that the p-value in (7) is more ex-treme than that achieved by the Karpov–Kasparovrun given in (6). Thus, if the latter is “incredible,”

what does this imply about the former? It is of in-terest to note how the pi fluctuations in the Fischer–Reshevsky game, as depicted in Figure 5, are suchthat the probability density for Pr(L57 = k) underpi from (4) is barely distinguishable from that underpi = 0.5, as illustrated by the smoothed histogramsin Figure 6.

5. DISCUSSION

In their evaluation of coincidences, Diaconis andMosteller (1989) showcased the roles played by mis-perception, multiplicity and the law of truly largenumbers. These factors have arguably contributedto Fischer’s overstating the significance of the Karpov–Kasparov move run. Indeed, misperceptions surround-ing the significance of runs arising from random pro-cesses are purportedly commonplace; see, for exam-ple, Scheaffer, Gnanadesikan, Watkins and Witmer(1996) for pedagogic exercises that illustrate thispoint. These misperceptions are compounded when(implicit) search for an extreme run is performedand multiplicity considerations are ignored; see Al-

Page 9: Chess, Chance and Conspiracy.pdf

CHESS, CHANCE AND CONSPIRACY 9

Fig. 6. Smoothed histograms of Pr(L57 = k).

Fig. 4. Fischer vs Reshevsky key position, prior to White’s34th move: 34. Nd4.

bert (2004) for some examples from baseball. In ad-dition to seemingly failing to account for these con-cerns when assessing run significance, Fischer, in ex-pressing incredulity at the Karpov–Kasparov moverun, has also assumed a constant “success” proba-bility for each move: pi = p = 0.5.

Even aside from variation in pi, it is clear from(1) or (2) that Pr(Ln ≥ k) depends heavily on p.This has again been noted in the baseball context(Casella and Berger, 1994)—a more successful player(team) is likely to have a longer hitting (winning)streak than a less successful player (team). Indeed,Lou (1996) provided an elegant extension to theimbedded Markov chain approach of Fu and Koutras(1994) that allows for evaluation of joint and con-ditional distributions of the number of successes andLn. Not only does her framework accommodate vary-ing pi, but it also facilitates testing of between-movedependency. However, application of these methodshere is problematic owing to difficulties in estimat-ing, or even prescribing, the requisite transition prob-abilities. Furthermore, the power curves displayedby Lou (1996) suggest that in our n = 63 setting we

Page 10: Chess, Chance and Conspiracy.pdf

10 M. R. SEGAL

will be hard-pressed to detect between-move depen-dencies. Additionally, as noted by Rubin, McCullochand Shapiro (1990), there are instances where suchconditional analyses are inappropriate. These includesituations when there is no intrinsic interest in thetotal number of successes, as is the case here wheresuccess corresponds to a move to a light square.

Alternative approaches to appraising the signif-icance and/or asserting the existence of runs andstreaks have been advanced. In the sports context,Yang (2004) employs Bayesian binary segmentationto assess whether transitions in success probabilityare evident. Applying his methodology to the keyKarpov–Kasparov game yields highly null results.

As highlighted by a referee, evaluation of successruns can be highly dependent on (i) the samplespace employed and (ii) how the success probabil-ities, pi, are framed. Here, we have argued that (i)the appropriate unit of analysis is a game, ratherthan a match or series of matches, and (ii) allow-ing variable pi, that at the least reflects the num-ber of legal moves available, is essential. The resul-tant achieved significance levels were sufficiently null[see (6)] that the above mentioned multiplicity cor-rections [for the number of games in the Karpov–Kasparov match(es)] were not pursued.

However, in other settings, including two celebratedexamples briefly described next, these considerationscan be less clear-cut. In their evaluations of JoeDiMaggio’s 56-game hitting streak (baseball), Shortand Wasserstein (1989) performed various probabil-ity calculations that are readily reproduced usingeither (1) or (2). What distinguishes the differingcalculations are the alternative sample spaces con-sidered: season, career, history of baseball. Indeed,dramatically different results ensue, with no attemptto arbitrate or reconcile between them. Throughout,they employ constant p based on lifetime battingaverage. In this case, given the rich documentationsurrounding baseball, it might be possible to im-prove on this imposition. This could make recourseeither to day-of-game batting average or, more am-bitiously, to modeling pi based on game specific co-variates, akin to Albright (1993). However, as ismade clear by the discussants of that article, suchmodeling is problematic. Furthermore, despite theextensive compilation of historical baseball statis-tics, extracting the requisite game level data is, atbest, a laborious undertaking.

Tversky and Gilovich (1989a) contended that the“hot hand” phenomenon in basketball is a “cogni-tive illusion” that derives from misconceptions of the

laws of chance. They base this assertion on analysesof data obtained by coding shot attempts from 48televised NBA games plus free throw sessions of col-lege players. In an effort at refutation, Larkey, Smithand Kadane (1989) took issue with the analyses inlarge part on the basis of sample space considera-tions. They argued that the unit of analysis shouldreflect “cognitively manageable chunks,” capturingtemporal proximity, which they operationalize as20 consecutive shot attempts. Tversky and Gilovich(1989b) offer a rebuttal that (in part) further re-fines sample space considerations by accommodat-ing individual playing times. Here a definitive char-acterization of sample space seems daunting. Morechallenging still is capturing variations in pi, despitethe widely acknowledged need to do so. This de-rives from the need to accommodate the hard-to-quantify notion of defensive pressure, above and be-yond facets such as position on the floor and phaseof the game. Further compounding these obstaclesis the absence of usable data bases: the analysesconducted required the individual investigators towatch and encode film, which in and of itself led todramatic disputation (Tversky and Gilovich, 1989b).

In our evaluations of the significance of the Karpov–Kasparov move run, not only have we benefited frombeing able to frame the problem (relatively) pre-cisely, but also from the existence of sophisticatedchess-playing software and game data bases. Thelatter tool could be used to assess a run potentiallyfar more remarkable than the move run, namely,Fischer’s own aforementioned run of 19 consecu-tive wins as part of the 1972 WCC qualifying cycle.This run included two 6–0 shutouts in his two candi-dates matches. As an indication of the incredibilityof these results and to ground things in the worldof grounders, Time magazine equated the shutoutswith pitching “two straight no-hitters.” Perhaps Fis-cher’s ascent to World Champion was part of someconspiracy.

ACKNOWLEDGMENTS

I thank the Executive Editor, an Editor, two ref-erees and Chuck McCulloch for many helpful com-ments and suggestions.

REFERENCES

Albert, J. H. (2004). Streakiness in team performance.Chance 17(3) 37–43. MR2061937

Page 11: Chess, Chance and Conspiracy.pdf

CHESS, CHANCE AND CONSPIRACY 11

Albright, S. C. (1993). A statistical analysis of hittingstreaks in baseball (with discussion). J. Amer. Statist. As-

soc. 88 1175–1196.Casella, G. and Berger, R. L. (1994). Estimation with

selected binomial information or do you really believe thatDave Winfield is batting .471? J. Amer. Statist. Assoc. 89

1080–1090. MR1294753Diaconis, P. and Mosteller, F. (1989). Methods for study-

ing coincidences. J. Amer. Statist. Assoc. 84 853–861.MR1134485

Dudoit, S., Shaffer, J. P. and Boldrick, J. C. (2003).Multiple hypothesis testing in microarray experiments.Statist. Sci. 18 71–103. MR1997066

Efron, B. (1971). Does an observed sequence of numbers fol-low a simple rule? (Another look at Bode’s law). J. Amer.

Statist. Assoc. 66 552–559.Feller, W. (1968). An Introduction to Probability The-

ory and Its Applications 1, 3rd ed. Wiley, New York.MR0228020

Fischer, R. J. (1996). Excerpts from an interview. Inside

Chess December 23 9.Fu, J. C. and Koutras, M. V. (1994). Distribution theory

of runs: A Markov chain approach. J. Amer. Statist. Assoc.

89 1050–1058. MR1294750Godbole, A. P. (1990a). Specific formulae for some suc-

cess run distributions. Statist. Probab. Lett. 10 119–124.MR1072498

Godbole, A. P. (1990b). Degenerate and Poisson conver-gence criteria for success runs. Statist. Probab. Lett. 10

247–255. MR1072518Krauthammer, C. (2005). Did chess make him crazy? Time

April 26 96.

Larkey, P. D., Smith, R. A. and Kadane, J. B. (1989).It’s okay to believe in the “hot hand.” Chance 2(4) 22–30.

Lou, W. Y. W. (1996). On runs and longest run tests: Amethod of finite Markov chain imbedding. J. Amer. Statist.

Assoc. 91 1595–1601. MR1439099Philippou, A. N. and Makri, F. S. (1986). Successes,

runs and longest runs. Statist. Probab. Lett. 4 101–105.MR0829441

Polgar, Z. and Shutzman, J. (1997). Queen of the Kings

Game. CompChess, New York.Rubin, G., McCulloch, C. E. and Shapiro, M. A. (1990).

Multinomial runs tests to detect clustering in constrainedfree recall. J. Amer. Statist. Assoc. 85 315–320.

Scheaffer, R. L., Gnanadesikan, M., Watkins, A. andWitmer, J. (1996). Activity-Based Statistics. Springer,New York.

Short, T. and Wasserman, L. (1989). Should we be sur-prised by the streak of streaks? Chance 2(2) 13.

Spassky, B. (1999). Excerpts from an interview. Inside Chess

12(4) 22–23.Storey, J. D., Taylor, J. E. and Siegmund, D. (2004).

Strong control, conservative point estimation and simulta-neous conservative consistency of false discovery rates: Aunified approach. J. R. Stat. Soc. Ser. B Stat. Methodol.

66 187–205. MR2035766Tversky, A. and Gilovich, T. (1989a). The cold facts about

the hot hand in basketball. Chance 2(1) 16–21.

Tversky, A. and Gilovich, T. (1989b). The “hot hand”:Statistical reality or cognitive illusion. Chance 2(4) 31–34.

Yang, T. Y. (2004). Bayesian binary segmentation procedurefor detecting streakiness in sports. J. Roy. Statist. Soc. Ser.

A 167 627–637. MR2109410