quality of play in chess and methods for measuring

8/18/2019 Quality of Play in Chess and Methods for Measuring

1/24

Quality of play in chess andmethods for measuring

Erik Varend

Tallinn, 2014

Abstract. In this study, using the computer, the subject of the research isthe absolute strength of play of various chess playing entities (humans and

computers). First of all, the actual accuracy of play will be determinedwhich is measured via the mean difference between the move suggested by

the engine and the move actually made. Thereafter, individually for eachentity, factors that have an effect on the accuracy of play will be

determined, and an estimated accuracy of play will be found based on those

factors. It shows the accuracy if all factors were the same for all players.As a result, it was determined and proven that there is a relationship between rating and the uality of play. In addition, it was also proven that

the further one goes bac! in time, the more the uality of play decreases."y comparing the accuracy of play in both humans# and engines# play it was

determined to what e$tent %%&' and FI ratings correlate. The author also drew several miscellaneous conclusions based on the collected data.

*. Introduction

The primary aim of this study is to find a correlation between the strength of play (either FI and %%&') and the

accuracy of play. Also + most noteworthy performances in the history of chess are under comparison, and how thestrength of chessplayers has changed over time. "esides, the final section of the paper contains various other conclusions that can be drawn from the data collected.

There are different ways to estimate and compare performances-

* by measuring absolute strength

by measuring relative strength.

Absolute strength can be defined by how far away a performance is compared to the perfect performance, i. e. thedistance between the actual performance and the best performance possible. The closer a performance stands to the

absolute perfection, the better it is. In case of relative strength a performance is compared to results of other performers,and the actual strength has no importance at all. In circumstances where ascertaining the absolute strength has not been

easily feasible, there has normally been no choice but to use relative strength measurement as a yardstic!. The latter is prevalent in case of one/on/one sports such as snoo!er, tennis and chess where '0 rating is used to compare the

strength of players. %omparisons of players and performances from different epochs is only possible using absolutestrength. It is, for e$ample, impossible to say who was stronger, 'as!er or 1pass!y if using their chessmetrics ratings.

%onseuently, we need to find an indicator of absolute strength in chess. There is a variety of ways to do this which can

be split into two primary types-

* tablebases various computer/based estimations

%ertainly, most preferred would be tablebases, because they give perfect solutions for each position. In that cas the

accuracy of play would be measured in the mean number of transitions per move. A transition is a change in the state of

the game 2 won, drawn, or lost 2 assuming perfect play from both sides. A change from a drawn position to a lost one,or from a drawn one to a lost one euals to * transition. If a won position becomes a lost position, it is transitions. The

fewer transitions per move, the higher the uality of play. Four piece tablebases were completed by the end of 34/s. 5/

piece T"s were compiled in early nineties, those with 6 pieces in 445, and now we have +/piece tablebases. This


2/24

implies that it#s uite hopeless to see the complete 7/piece tablebases in near future. That#s the reason why chess

engines are necessary. There are many ways to describe the absolute accuracy with the help of the chess engines-

* The average difference between the best move suggested by the engine and the move actually made *.* difference e$pressed in centipawns

*. difference e$pressed in percentages The average change in evaluation after the move made by the player

7 The percentage of moves that coincide with those suggested by the engine8 The percentage of moves where the error e$ceeds a predetermined threshold

The version *.* can be called the classical method, since it was used by 1lovenian researchers I. "rat!o and 9. :uid in

their groundbrea!ing study.* The magnitude of an error is essentially the centipawn gap between the evaluation of amove suggested by the engine and a move actually made. 1maller differences indicate more accurate play.

Another promising possibility is to use percentages instead of centipawns, i. e. similar to 9onte/%arlo method. The percentage indicates white#s scores against blac! after a move. To find the score, a computer is set to run a certain

number of games against oneself. "etter scores would represent moves that are more preferable. The downside is the

fact that it ta!es a lot of time to get a statistically valid number of moves, especially ta!ing into account the need for ensuring that the engine has enough time per move. 0therwise its useflness in more complicated positions becomes

uestionable due to the hori;on effect. Its advantage primarily lies in theoretically drawn endgame positions whereevaluation/based estimations are !nown to be unreliable. ortuguese scientist . &. Ferreira has wor!ed out an interesting alternative solution, where what matters is not thegap to the best move at the same position, but between evaluations of best moves before and after a move has been

made by a player. 'i!e the classical method, Ferreira#s method can be used with percentages.

These tables below display differences in the classical and Ferreira methods in the cases of centipawns and percentages.

move evaluation gap Move afterNe5

evaluation change

*7.?e5 4.78 *7...h6 4.4* /4.77

*7.&c* 4.7* *7..."c8 4.48

*7.e8 4.45 4.@ *7...d$c8 *.47

*7.?c /4.* *7...f5 *.78

*7.a7 /4.*8 *7..."$e5 *.34

move percentage gap Move after Rc1

percentage change

*7.&c* 55= *7...h6 5= /7=

*7.?e5 54= *7..."c8 55=

*7.e8 54= 5= *7...d$c8 @7=

*7.?c 84= *7..."$e5 @+=

*7.a7 7@= *7...f5 @@=

The ways 7 and 8 are clearly inferior.The fact that a move made by a player coincides with that of made by computer may in most cases indeed indicate a

good move.


3/24

impact as one error of 4.34. The principal problem of the both methods lies in the fact that they are too coarse and do

not describe the position of moves on the uality spectrum.

owever, deterimining the absolute accuracy of play alone is not sufficient. A performance never happens in a vacuum,isolated from all factors acting upon it. The level of a performance can only be manifested by the co/influence of the

two factors-

* potential conditions

Potential is the ability of a player to e$hibit an as high standard of performance as possible. It depends on a variety of

characteristics that differ for each sports. For instance, physical sports reuire good physiue, stamina and technicals!ills. In mental spords, such as chess and go, the reuired characteristics would include short/term memory, calculation

speed, intuition etc.

Conditions refers to a set of factors upon which the accuracy of play depends-

* difficulty of positions thin!ing time

7 practical play 8 psychology

5 conditions in the venue 6 health

+ level of fatigue

The first three ones are the most important.In some positions it is easier to find a good move, whereas in other positions it is more difficult. That#s what the term

#difficulty of positions# refers to. There are many ways a position can be difficult, it cannot be described by a singlefactor alone. It consists of many aspects, for e$ample, it may be chaotic and complicated, or there are relatively few

good moves in a position, or good moves appear illogical at first sight etc. Also, difficulty is individual and variesamong different players- what for one player is difficult, may be easier for another. %omputers generally are able to find

illogical moves with greater certaincy than humans.

Thin!ing time is just a time control games are played under. 0ver time rate of play has gradually gotten increasinglyshorter.

The notion #practical play# refers to the phenomenon where a player intentionally sacrifices the accuracy of play to ma!ematters more difficult for the opponent. The goal is to create such a situation where he would have to ma!e comparably

more effort to maintain the same level of accuracy. There are 7 !inds of situations that can be perused in practical play.

* difficulty of positions

suitability of the type of positions 7 thin!ing time.

1uitability of the type of positions indicates how much a certain type of positions suits a player and his nature whether

he is familiar with such type of positions, whether a given position needs more of calculating, !nowledge, intuiton etc.

In the start position and usually at the beginning phase of the game all the three factors are even for either player. Theaim is to introduce imbalances into the game situation, in favour of the first player itself, so as to the opponent has moredifficult positions which also are less suitable for him. If the opponent is in time trouble, then moving faster so that he

has less pondering time.

>sychology plays an important role in chess. A chess player must have willingness to endure competitive stress. It isimportant that he has ability to remain calm in critical moments. 1ometimes it happens that a chess player allows

himself to be disturbed by psychological factors, such as problems in private life, concerns over homeland or relativesand friends, that can affect concentrating on the game, and hinder going all out. The third type of psychological factors

is directly connected to chess whether incompatibility with the style of an opponent, fear, or a feeling of uneasiness

with him. >robably the most famous e$ample is 1hirov#s lifetime score against Dasparov 2 +- with no wins, which isfar more than one could e$pect from their ratings. Also, one may have gotten used to the style of a particular chess

player to the e$tent that une$pected sudden changes in his play may confuse. Among these are cases where a player,

who usually has preferred correct and objective play, suddenly sacrifices material. "elieve him or notE%onseuently, psychological factors can be bro!en in three main types-

* factors arising from player#s characteristics


4/24


5/24

today#s rating chessmetrics rating 654 corresponds to.

To find a correlation between modern rating and the accuracy of play, @ cohorts at each *44 elo were analy;ed in the

range *@44/+44. In each cohort the rating range was J$/5 $K5L, where $ signifies the goal rating of a particular cohort. The lowest number of moves was 844.

And to find out how strongly engines play, 5 different chess engines from %%&' 84B84 rating list7. Ta!en into

consideration were at least 54 moves by the following engines- iarcs *.* (@*), %rafty 7.4 (674), >hilou .3.4(76+), , with 65 seconds per move. The chess interfacewas Arena . The hardware was Intel i+ 364 M .34 :h;.

0nly moves made in more/or/less even positions should generally be considered. If moves suggested by the engine andthose actually made on the board were both outside the range J.44 /.44L and with the same sign, then a position was

considered as decisive, and moves were discarded.

As a novelty, left out of consideration were moves that are very obvious. A move is considered as being too easy to spot

if it meets the two criteria below starting from the first ply-

• a move suggested by the engine remains the same• the gap between the two best moves is always *.44 or larger.

0ne must have in mind that there is a boundary above which the magnitude of errors is irrelevant. For e$ample, if a

player ma!es a move after which the evaluation drops from *.7 to /7.48, and another move with the evaluation drop of

from 4.3@ to /**.47, then there#s no basis for assuming that the former is objectively better than the latter.


6/24

ach blue dot represents a position. The red line shows the linear correlation between evaluation and error, also called aslope. Nnli!e the factors of difficulty, the relationship between accuracy and evaluation does not depend on player#s

nature of play. Therefore, if the evaluation were the same for all players, the e$cpected error would have to be derivedaccording to the same formula. "ut here a new problem arises- the average error varies among players, affecting the

slope of the linear relationship. A slope indicates the degree of error change in relation to evaluation changes. To findthe relationship between the slope and the average error, *4 randomly pic!ed selections with *544 positions in each

were selected. The graph below shows the result that can be ta!en as a basis.

For e$ample, if a player has the average error of 4.*4, then, according to the formula, his slope would be*.5*O4.*K4.4 P 4.4. Increasing of the average evaluation by 4.* would cause the player#s average error to be inflated

by 4.O4.* P 4.4.

. ifficulty of positions

This research uses different factors of difficuly-

• the difference between the best and the second best moves, e$pressed in centipawn units• comple$ity

The first one is self/e$planatory. The latter one needs some e$plaining. The manner of calculating comple$ity is ta!enfrom the wor! of "rat!o and :uid. very time the engine proposes a new Q* move, the gap between the best and and

the second best moves is recorded, and at the end all these are summed together. ere it is presented in the form of original program code.

complexity := 0

FOR (depth 2 to 12)

IF (depth > 2) {

IF (previous_est_move !O" #$%&' curret_est_move) {

complexity = *est_move_ev+lu+tio

, secod_est_move_ev+lu+tio*

-

Graph 1: evaluation vs average error

0,0 0,2 0,4 0,6 0,8 1,0 1,2 1,4 1,6

0,00

0,06

0,12

0,18

0,24

0,30

0,36

0,42

0,48

0,54

f(x ) = 0,29x - 0 Evaluation vs avg error

evaluation

e r r o

r

Graph 2: eval vs avg error slope depending on the average error

0,060 0,080 0,100 0,120 0,140 0,160 0,180 0,200 0,220 0,240 0,260

0

0,050,1

0,15

0,2

0,25

0,3

0,35

0,4

0,45

f(x) = 1,51x 0,02

!" = 0,82

avg error vs eval slo#e

avg error

e v a l s l o # e


7/24

-

previous_est_move := curret_est_move

-

In this study a modified version is used. Through depths *4/*5 plies all values are doubled to assign them more

importance. It#s always harder to see any changes in greater depths and indicates a more complicated position."elow is a comparative e$ample how computing comple$ity scores is carried out in both ways. ighlighted with yellow

are cases where the best move changes. The sum at the lowest row shows the degree of comple$ity.

move evaluation difference depth move evaluation difference

?c6 /4.7 4.44 ?c6 /4.7 4.44

?f6 4.58 4.47 7 ?f6 4.58 4.47

?f6 /4.* 4.44 8 ?f6 /4,* 4.44

d6 4.85 4.*6 5 d6 4.85 4.*6

?c6 /4.*4 4.44 6 ?c6 /4.*4 4.44

?c6 4.74 4.44 + ?c6 4.74 4.44

?f6 /4.47 4.4 3 ?f6 /4.47 4.4

?f6 4.@ 4.44 @ ?f6 4.@ 4.44

?f6 4.* 4.44 *4 ?f6 4.* 4.44

?f6 4.*@ 4.44 ** ?f6 4.*@ 4.44

?f6 4.45 4.44 * ?f6 4.45 4.44

e5 4.@ 4.*+ *7 e5 4.@ 4.*+ ($)

e5 4.* 4.48 *8 e5 4.* 4.48

e5 4.6 4.*8 *5 e5 4.6 4.*8

sum 0.38 sum 0.55

The two graphs below illustrate how both factors of difficulty influence the accuracy of play. All analy;ed positions are

included.

Influence of a factor of difficulty on a player is individual, and is dependent on one#s nature of play. 1ome players arerelatively more susceptible to changing difficulty of positions. Their accuracy becomes worse faster than other players

with increasing difficulty. As the difficulty of positions cannot be described by one parameter only, it remains possiblethat different factors have different effect on a player. For e$ample, in the instance of two eually strong players, one of

them may have a lower than average tolerance for the factor represented by #comple$ity# in this study, and a higher thanaverage tolerance for #difference# but completely the other way around for another one.

.7 Thin!ing time

There is plenty of information in the Internet about various time controls that have been used in various eventsthroughout history. Nnfortunately it is not always possible to find any information about in a certain event. In such cases

the following principle was applied- *334 / *@5 8 min *@6 / *@85 7 min 4 s *@86 / *@35 7 min 85 s *@36 / ... 7min per move.

D.


8/24

K *4##Bmove / 744 elo.8 According to that, the double difference in thin!ing time is eual to ** elo, and the relationship

between them is logarithmic. For engines the difference is worth 66 elo.

The biggest concern in games of earlier times is adjourned games. There#s no doubt that a possibility to analy;e games

either alone or with assistants greatly helps the accuracy of moves played after resuming the game. It would benecessary to !now how long those sessions lasted before resuming the play, and whether analy;ing was allowed. As in

the case of time controls, information is rather scarce. In the absence of reliable information, * hour was added to timecontrol of each game that underwent adjournements as a compensation.

1ometimes the remaining number of moves after 84 thB64th move has not been specified in time control information,

e$cept the number of minutes. In such cases the remaining time amount was divided by the number of moves actually played. If it e$ceeds time per move specified in the first part of time control, then the average thin!ing time in a given

phase of the game is considered the same as in the preceding phase.

.8 >ractical play

0f the three possible manifestations of practical play, only the difficulty of positions is loo!ed at here. Ideally, it wouldhave been preferable to use the suitability of position types and thin!ing time as well. "ut in the first case it would have

been necessary to devise a way to uantify the suitability of the types of positions for players. In the latter case the!nowledge on the precise amount of time spent on thin!ing on each move would have been needed. "oth areunreali;able at the current juncture. The method in itself is simple 2 measure and compare the difficulty of positions for

either side of the board. If one side has positions that are easier to play, it may be assumed that its results are better thanits accuracy of play would suggest. The effect of difficulty difference between either player depends on two factors-

a) degree of the difference between the difficulty of positions

b) sensitivity of a player#s accuracy of play to difficulty

:enerally there#s no data available on the tolerance of particular players with respect to changing difficulty level. Insuch cases it#s possible to use generali;ed sensitivity to both factors of difficulty of positions and which is dependent on

the average error. In order to find this, first we ta!e the average rating of all opponents and loo! up its euivalentaverage e$pected error in the error2rating table. 1econdly, we determine the relationship between average error and

slopes for both types of difficulty, as shown in the graphs below. 1imilary to the graph , each data point represents arandomly/selected dataset of *544 positions.

8 http-BBwww.chessgames.comBperlBchess.plEtidP34@34H!pageP4Qreply575

Graph 6: complexity slope depending on avg error

0,060 0,080 0,100 0,120 0,140 0,160 0,180 0,200 0,220 0,240 0,260

0

0,05

0,1

0,15

0,2

0,25

0,3

0,35

f(x) = 0,61 x+0,93'o%#lexit& an* s lo#e

avg error

s l o # e

Graph 7: difference slope depending on avg error

0,060 0,080 0,100 0,120 0,140 0,160 0,180 0,200 0,220 0,240 0,260

0,00

0,10

0,20

0,30

0,40

0,50

0,60

f(x) = 0,24 ln(x) 0,9*ifferen'e an* slo#e

avg error

s l o # e

Graph : !ependence of performance on thin"ing time

0,00 0,20 0,40 0,60 0,80 1,00 1,20 1,40 1,60 1,80 2,00

-600

-525

-450

-35

-300

-225

-150

-5

0

5

150Tining ti%e vs elo

ti%e 'oeffi'ient

e l o # e r f o r % a n

' e


9/24

1lope increases with average error. ence, if our opponent had an average e$pected error of 4.45, its comple$ity vs

average error slope would be 4.6*O4.454.@7P4.*8$ and difference vs average error slope 4.8Oln(4.45)[email protected]*$.It#s not necessary to include practical play if both sides of games are ta!en into analysis. In that case differences in

difficulty, suitability, thin!ing time etc would cancel each other out. If a game for one player is on average moredifficult by $ hypothetical units, then his opponent has, at the same time, the game easier by /$ units, and the sum would

always be ;ero.For this reason, practical play has only been included in the analysis of the games of the + most remar!able

performances in the history of chess. As for the rest of games, both sides have been ta!en into account.

.5 Finding the strength of play

aving determined the absolute accuracy of play and the aforementioned factors having effect on it, it becomes possible

to derive the e$pected error of players. It consists of the following steps-

*. Find the average e$pected error of players.. stablish a relation between a modern rating and the e$pected error.

7. Find the modern rating euivalent of the e$pected error.8. Find out rating lossesBgains due to time control an practical play.

As a result we will get a supposed today#s rating corresponding to the strength of play. Nnfortunately, one must be

satisfied with the fact that full confidence can never be attained. 9ethods described here are by no means *44=reliable, as it#s still in its infancy and chess engines of today have limited abilities.

The e$pected error indicates a player#s hypothetical accuracy of play (average error), if the difficulty of positions andevaluation were e$actly the same for all players. In this study the average comple$ity of all moves valid for comparison

2 4.53 and difference 2 4.57 were used to represent a common ground. The graph below showing how the accuracy

of %apablanca and Dramni! changes as a function of comple$ity also depicts the manner the e$pected error isdetermined with the help of linear trend lines.

As we can see, %apablanca#s e$pected error by comple$ity is 4.4+4, and that of Dramni! is 4.438. Dramni!#s positions

were a little more complicated and those of %apablanca was far lower than the average comple$ity of all positions,

therefore the gap between their accuracies of play would be smaller if they both had positions of the same comple$ity.The e$pected error according to the difference is found by the same method. 0ne can also note that %apablanca#s

accuracy of play has been less dependent on difficulty than Dramni!.

Graph #: $inding expected error

0,00 0,05 0,10 0,15 0,20 0,25 0,30 0,35 0,40 0,45 0,50 0,55 0,60 0,65 0,0 0,5 0,80 0,85 0,90 0,95 1,00

0,00

0,02

0,04

0,06

0,08

0,10

0,12

0,14

0,16

0,18

0,20

f(x) = 0,15x 0,01

!" = 0,11

f(x) = 0,06x 0,04

!" = 0,03

$a#a.lan'a /e or 192

ra%ni vs as#arov %at' 2000

'o%#lexit&

e r r o r

averagecomple$ity of%apablanca#s positions

average

comple$ity of!ramni!#s

positions

averagecomple$ity ofall positions

actual averageerror of

%apablanca#smoves

actual average

error ofDramni!#s

moves

e$pected errorof both player#s moves

change inDramni!#saccuracy dueto changingdifficulty

change in%apablanca#saccuracy dueto changingdifficulty


10/24

7. &esults

The following section is divided in two parts. First all necessary data on all analy;ed chess/playing entities will be dealt

with, and then, step/by/step based on that, we#ll find the hypothetical strength of play of each player.The most important of these is, of course, the actual accuracy of play i.e. the average error. The result of a game only

depends on differences in the accuracy of play. owever, it must be born in mind that it never directly shows the levelof chess s!ills, but rather remains biased towards players with more positional style and longer time controls. The

following graph displays all chess/playing entities sorted by average error.

$pectedly, most engines occupy top spots.


11/24

It stands out that %apablanca had positions with by far least difficulty. There has been a lot of mentioning on Fischer#s

simple style of play, his tendency to avoid complications. Indeed, according to the graph **, the comple$ity of his

positions were below average in games against 'arsen and Taimanov. owever, it can be seen on the gaph *4 that theaverage difference between two best moves in Fischer#s positions is above average. The fact that a position seems

somewhat easy to us does not automatically mean it would be easy to find accurate moves. It is perhaps not uitesurprising that correspondence games from chessgames.com have the lowest average evaluation, i. e. in those games the

positions were eual longer due to higher uality of play.

The graph above shows the average e$pected error which is derived by ta!ing the average of both e$pected errors bycomple$ity and difference and includes changes in the average error due to the evaluation. The results are more logical,

compared to what was dispalyed on the graph @. %orrespondence games are left out, as there is no point in measuring

changes in the accuracy of play, if its estimation cannot be trusted. As a rule in all !inds of measurements, the gauge

Graph 12: &verage evaluation of positions

$$ga%es$raft&

1910s2600

7iar's1940s180s1860s1920s

$a#a.lan'a190s1990silou

i'ro-axar#ov1900s1960s

ra%nira#i* 200

as#arov2200

1930s1880saser 2000s

24001900

is'er $arlsen

1980s.lit 200

1950s1890s

210025002000

ax%an2002300

0,000 0,200 0,400 0,600 0,800 1,000

0,18 0,356

0,380,3900,399

0,4200,4250,4250,4280,4380,4450,4480,452

0,410,5020,5040,5040,5100,5190,5260,5320,530,5430,558

0,540,5850,585

0,6120,615

0,6420,6500,6620,6630,610,69

0,100,84

0,8030,881

evaluation

Graph 13: &verage expected error

.lit 200

1900

2000

2200

1860s

2100

1920s

1890s

2500

1910s

2300

2600

180s

2400

1900s

1880s

ra#i* 200

1960s

i'ro-ax

1950s

1990s

aser

1940s

2000s

1930s

1980s

190s

ax%an

200

as#arov

ilou

ra%ni

$a#a.lan'a

ar#ov

$arlsen

$raft&

is'er

7iar's

0,000 0,050 0,100 0,150 0,200 0,250 0,300 0,350

0,293

0,258

0,243

0,229

0,22

0,22

0,211

0,206

0,200

0,193

0,193

0,14

0,168

0,16

0,16

0,166

0,161

0,156

0,155

0,153

0,151

0,142

0,138

0,134

0,132

0,129

0,11

0,116

0,112

0,092

0,091

0,091

0,03

0,02

0,065

0,064

0,054

0,052

:verage ex#e'te* error


12/24

must be of higher uality or trustworthiness than things being measured. The methods used in this paper are simply not

adeuate enough for modern software/assisted correspondence games.

The ne$t step is to ta!e data from the previous graph to find the relationship between the rating and the uality of play.

The relationship appears to be logarithmic. The blac! line depicts the appro$imate boundary of trustability below which

engine output cannot be trusted. It is interesting to note that it crosses the trend line at @7* '0, which may indicatethat the level of play of the combination of the engine, hardware and time used here is eual to @7* FI 443. "ut

that is naturally a speculation which needs further research.

>layers ran!ed according to thin!ing time-

The farther bac! in time, the longer time controls are.

The ne$t steps represent an attempt to factor in at least a fraction of generally unfathomable and messy notion called

practical play.

Graph 14: )he relationship (et*een accuracy and fide rating 2%%#

3 0

0 0

2 9

0 0

2 8

0 0

2 (

0 0

2 6

0 0

2 5

0 0

2 4

0 0

2 3

0 0

2 2

0 0

2 1

0 0

2 0

0 0

1 9

0 0

1 8

0 0

1 (

0 0

1 6

0 0

0

0,02

0,04

0,06

0,08

0,1

0,12

0,14

0,16

0,18

0,2

0,22

0,24

0,26

0,28

0,3

f(x) = 0,11 ln(x) - 0,03

!" = 0,86

Te a''ura'& of #la& an* ;E rating 2008

E< rating

e x a v e r a g e

# e ' t e *

e r r o r

Graph 1: )hin"ing time

.lit 200

ra#i* 200

ax%an

$raft&

7iar's

ilou

i'ro-ax

$arlsen

1900

2300

2000

2500

2000s

2600

21002400

2200

200

1990s

ar#ov

as#arov

ra%ni

1930s

1980s

1940s

1950s

190s

1910s

$a#a.lan'a

is'er

1960s

180s

1890s

1880s

1900s

aser

1920s

1860s

0 50 100 150 200 250 300 350 400

5

24

60

60

60

60

60

134

143

152

153

156

158

160

161164

165

10

180

180

180

180

200

203

21

225

225

240

244

249

250

259

264

266

26

280

289

341

Ti%e 'ontrol

"oundary of trustabilityP 4,47*


13/24

>layers ran!ed according to relative difference of positions-

?egative value shows that opponents had positions easier, in the case of positive one it is the other way around. As onecould have e$pected, Dasparov and 'as!er, players !nown for practical play, are situated on top. 1omewhat surpisingly,

it appears that even %apablanca too had both difficulty factors easier than his opponents. 0ne of reasons could be thatthe easier one#s positions, the greater the probability that the opponent#s positions are more difficult, despite the degree

practicality in one#s play. Dramni!#s positions were e$pectedly easier than Dasparov#s in the title match 444.

"efore trying to find out how much e$actly a difficulty differential influences opponent#s play, it is necessary to !nowthe strength of opponents. First we loo! up FI or chessmetrics rating of that time and translate it into contemporary

rating euvalent.

The blue line represents actual data based on the analysis of randomly pic!ed games (rating range 644/+44) from

each decade. The red line represents top/rated players# strength of play. The gap in each decade between a top/rated player and a 654/rated player is based on the arithmetical averages of january lists in the same decade. It can be seen

that if the logarithmic trend line can be trusted, the first time top players reached an F9 level (744 '0) already inmid/*@th century. The level of an International 9aster (844/544) was achieved in *334/*3@4s. :9 level was reached

during the first decades of the RR century. evelopment was relatively uic!/paced at that time. Top players were ontoday#s 1uper :9 level already in the 84/ies. arsen 191

$a#a.lan'a /e or 1924

ar#ov inares 1994

aser /e or 192

as#arov inares 1999

-0,100 -0,050 0,000 0,050 0,100

!elative *iffi'ult& of #ositions

'o%#lexit& *iff eren'e

Graph 17: evolution of chess strength (y decades

1830s 1840s 1850s 1860s 180s 1880s 1890s 1900s 1910s 1920s 1930s 1940s 1950s 1960s 190s 1980s 1990s 2000s 2010s 2020s 2030s

1800

1900

2000

2100

2200

2300

2400

2500

2600

200

2800

2900

3000

!ise of 'ess sills over ti%e

elo 'ess%etr i's 2650 ogar it%i' (elo 'ess%etr i's 2650)

elo 'ess%etri's igest ogarit%i' (elo 'ess%etri's igest)

*e'a*es

5 ; ) E 2 0 0 8


14/24

The graph below demonstrates the opponent ratings and their actual strengths.

?ot surprisingly, Dramni! had the strongest opponent against Dasparov in 444. The wea!est opposition was against%apablanca in ?ew or! *@8. "y translating those ratings into average e$pected error and ta!ing into account how

generali;ed sensitivity, as described in the section .8., to either type of difficulty depends on it, it can be ascertainedhow much difficulty differentials affect performance.

The ne$t graph shows the final conclusion of this wor!. The blue bars indicate ratings directly derived from the

e$pected error, as shown on the graph *8. arsen 191

ar#ov inares 1994

as#arov inares 1999

$arlsen /aning 2009

ra%ni vs as#arov 2000

2200 2300 2400 2500 2600 200 2800 2900 3000

246

2533

2690

224

263

285

288


15/24

The winners according to this criterion are iarcs *.* and %arlsen in ?anjing 44@. It may seem surprising that his

actual accuracy of play in that tournament was almost *44 points lower than his official T>& (744*). "ut it can bee$plained by two facts- against &s. It is hardly a surprise that at the bottom of the

graph are situated those who should be there 2 *@44/rated players and blit; games of +44/rated players.

8. 9iscellaneousIn this section several additional interesting conclusions that the collected data offers will be provided.

8.* %hessmetrics 7/year pea! top 54

The table below compares 7/year pea! ratings for each player ta!en from the chessmetrics.com site and their FI

euivalents in 443. The year indicates the middle year of the three/year periods.

According to that table, the strongest level of play of all times was performed by Dasparov during *@3@/*@@* where his

play supposedly would have been rated circa 364 in 443. et, it must be ta!en into account that this table is a bit

name year chessmetrics FIDE 2008

1 as#arov 1990 284 2861

2 is'er 192 286 281

$a#a.lan'a 1920 285 2664

! aser 1895 2855 2562

" ?otvinni 1946 2852 238# :leine 1931 2841 2684

$ ar#ov 1989 2833 2819

8 :nan* 1998 2822 2825

% ra%ni 2001 2815 2824

10ills.ur& 1901 2806 2540

11aro'& 1906 299 2554

12or'noi 199 298 263

1Tarras' 1895 296 2503

1! ;van'u 1992 294 285

1"@teinit 1885 294 2451

1#@%&slov 1955 293 203

1$etrosian 1962 289 216

18Tal 1960 286 208

1%!u.instein 1912 281 2559

20!esevs& 1953 26 268121/a*orf 194 25 2664

22Auertort 1884 24 2425

2eres 1956 23 2685

2!/i%oits' 1929 20 260

2"?ronstein 1951 20 2669

2#@#ass& 190 26 212

2$a%s& 1995 265 262

28$igorin 1896 263 245

2%arsall 191 259 2556

0eo 2001 25 266

1Banos& 1904 25 2504

2ine 1940 256 2626

To#alov 199 254 255

!@alov 1994 254 249

"Celfan* 1992 254 245#@irov 2000 253 260

$?ogolu.o 192 253 2583

8Celler 1963 252 2681

%oroevi' 2000 251 258

!0Eue 1936 250 2608

!1 :*a%s 2001 249 258

!2olugaevs& 19 248 209

!?eliavs& 1988 24 231

!!Ti%%an 1989 24 233

!"@'le'ter 1911 24 2522

!#ortis' 1980 246 213

!$@tein 1966 245 2681

!8Daganian 1985 244 221

!%Bussu#o 198 244 225

"0arsen 190 244 2689


16/24

misleading. It is often so that the chessmetrics rating of a player is a decade or more later only a few points below his

pea!, but nevertheless his play is better due to general rise in chess s!ills. For e$ample, 'as!er#s chessmetrics rating in*3@8 was 3+3, but in *@*+ it was 364, whose 443 euivalent would have been ca 654. Dasparov#s rating in *@@@

was 338, merely p lower than his best he achieved in *@@7, but there is no doubt that his actual uality of play hadimproved by that time.

8. %omparison of human and engine ratings

In this wor! there is enough data on both humans and engines for ma!ing interesting comparisons between so differenttypes of players. "elow is a side by side comparison of the relationships between the accuracy of play and ratings of

either type. %%&' ratings are given as of *+.*4.4*8. According to the site, the time control was chosen in such a wayas to be euivalent of 84 moves per 84 minutes on Athlon 68 R 8644K (.8 :h;).

As we can observe, the trend lines are of opposite nature. The relationship between the accuracy of play of human chess players and the rating is logarithmic on lower levels, the accuracy gaps are smaller than at the top. 0n the other hand,

in the case of engines, it is completely opposite 2 e$ponential. It should be noticed how closely the trend line followsthe actual line representing the accuracy of play there is a star! contrast. It confirms what was !nown for long 2

computers# play is far more stable.

"ased on data on those two graphs, it is possible to compile conversion tables for finding one/on/one correspondences

between both rating systems.

Graph 2%: $/!. rating vs accuracy

3 0 0 0

2 8 0 0

2 6 0 0

2 4 0 0

2 2 0 0

2 0 0 0

1 8 0 0

1 6 0 0

0

0,04

0,08

0,12

0,16

0,2

0,24

0,28

Te a''ura'& of #la& an* ;E rating 2008

E< rating

e x # e ' t e * e r r o r

Graph 21: CC+- rating vs accuracy

1 5 0 0

1 ( 0 0

1 9 0 0

2 1 0 0

2 3 0 0

2 5 0 0

2 ( 0 0

2 9 0 0

3 1 0 0

3 3 0 0

0

0,04

0,08

0,12

0,16

0,2

0,24

0,28

Te a''ura'& of #la& an* $$! rating 2014

$$! 4040 rating

e x # e ' t e *

e r r o r

&&R' !0(!0 FIDE 2008 !0(%0)0

3400 2926

3300 2921

3200 2916

3100 2911

3000 2904

2900 289

2800 2888

200 28

2600 2864

2500 2849

2400 2830

2300 2805

2200 24

2100 234

2000 269

1900 26031800 2493

100 2325

1600 2054

FIDE 2008 !0(%0)0 &&R' !0(!0

2900 2941

2850 250

2800 2281

250 2136

200 2034

2650 195

2600 1896

2550 184

2500 1805

2450 10

2400 139

2350 112

2300 1688

2250 1662200 164

2150 1630

2100 1614

2050 1599

2000 1586

Graph 22: human and engine rating comparison

2000

2100

2200

2300

2400

2500

2600

200

2800

2900

3000

relationsi# .eteen ;E an* $$! ratings

$$! 2014 4040

5 ; ) E

2 0 0 8 4 0 9 0 3 0

Graph 23: engine and human rating comparison

1300

1500

100

1900

2100

2300

2500

200

2900

3100

relationsi# .eteen $$! an* ;E ratings

;E 2008 409030

$ $ ! 2 0 1 4 4 0 4 0


17/24

At first sight, it may seem surprising that the best chess engines are, according to this, so wea! compared to humans.

"ut, it must be ta!en into consideration that %%&' games are run on a uite wea! hardware and the rate of play isnearly 7$ uic!er than the standard FI time control. It can be concluded from the graph that in the beginning it

was uite easy for engines to ma!e progress against humans, but with time it is getting increasingly harder. ?ote-comparisons were made on the assumption that humans play against engines as they would against other humans i. e.

not using any anti/computer strategies. Nnfortunately there is not yet a reliable way to emulate anti/computer play andits effects.

The reasons why the relationships between the accuracy and strength of play are e$actly li!e that, are un!nown to the

author. 0ne of feasible reasons could be that the nature of the curve is related to the relative importance of calculation/evaluation. The larger the relative importance of calculation in the move/choosing process (engines), the steeper the

e$ponential curve is while gaining in rating points, there is an ever/decreasing rate in the accuracy gain. "ut, if a player has a larger relative importance of evaluation (humans), then there is a contrary phenomenon- rating growth means a

faster increase in the accuracy. If it is true, then there presumably must be such a hypothetical mutual relationship

between calculation and evaluation where the accuracy vs rating relationship is linear.

8.7 &ating inflation and deflation


18/24

0n the first graph we can see that the gap has decreased from 64 points in *@+4 to 7+ points in 4*8 5 points per decade. %learly recogni;able are two #mountains# and four #valleys#. The valleys refer to periods where rating numbers

were uite high because of a dominant player 2 Fischer, Dasparov (two periods), and %arlsen. The mountains mar! periods with euality and no clear dominator. 0n the other graph there is a completely different situation. The ratings of

*st players of the rating lists have been relatively stable since *3@4s, while the s!ill level has steadily been rising. Inother words 2 what we see there is the deflation in the chessmetrics rating system. The rate of deflation has decreased

somewhat since *@64s, which is logical, as the rate of improvement of playing s!ills must slac!en over time. ere too,#valleys# from domination periods can be seen. The rate of deflation is 643 points wihin *387/448 2 7.+3 points per

year.

8.8 arious trends

>reviously we loo!ed at how the strength of play had changed across the history and the two distinct rating systems.

owever, the same can be applied to changes in other factors, such as slope and both factors of difficulty. "efore havinga closer loo! at those, a short introduction on the notion of #slope# and what it actually indicates will be presented below.

As persons more familiar with chess !now, chess players can be split into two groups based on the nature of play-

* positional, where intuition and !nowledge prevail

tactical, where the speed of calculations, precision and creativity are most important

Nsually it is !nown that nature of play dictates the choice of openings and the type of positions, but differences are also present in players# tolerances with respect to the difficulty of positions and thin!ing time. If one tries to solve a problem

by calculating variations and possible outcomes, then it generally ta!es a lot of time before a solution is reached. 0n the

other hand, it is universal- calculation is suitable for solving any type and however difficult problems. The advantage of problem solutions based on !nowledge or intuition is speed. It ta!es almost no time to recall facts in memory or reali;e

something via intuition. Their disadvantage is the fact that it is only suitable for relatively simple and more familiar

problems in case of solutions being illogical and une$pected, it fails. From this, the following facts follow-

* players of positional type are relatively less sensitive to thin!ing time, but more sensitive to the difficulty of positions

it is contrary with tactical players- they are less sensitive to the difficulty of positions, but more sensitive tothin!ing time

ence the fact that the si;e of the slope of the relationship between average error and a factor of difficulty depends on

player type. Tactical players have it smaller, positional ones bigger. 1uch a phenomenon may give us a simple methodto find out which players have bigger relative importance of calculations and which ones intuitionB!nowledge in their

move/finding processes.

The graph on the left shows the absolute average slope of all entities covered in this study. "ut, as it can be seen on thegraphs 6 and +, the si;e of slopes is dependent on the average error. Therefore, it is more preferable to determine how

Graph 2: deflation in the chessmetrics rating system

-800

-00

-600

-500

-400

-300

-200

-100

0

100

f(x) = 3,8x - 530,16

!" = 0,92

$ess%etri's rating *eflation

&ear

g a #

73 points per decade


19/24

much the actual slope deviates from the e$pected slope. The formulas for calculating the e$pected slope are given here-

complexity relative slope=a(solute slope

4.6*∗avg error 4.@7

and

difference relative slope=a(solute slope

4.8∗ln (avg error )+4.+@

The smaller the digit, the bigger is the importance of calculations. The graph on the right reveals that all chess enginesare at the top half, according to e$pectation. %apablanca and Dramni! are situated at the bottom half, confirming the

common belief that those players were primarily intuitive players. >erhaps surprisingly, Dasparov and Darpov standclose to each other and 'as!er so far down.

The changes of the average relative slopes across time and both rating systems are presented below.

Graph 26: players sorted (y a(solute slopes

.lit 200

1920s

2300

2600

1900

1860s

2400

2500

1910s

2000

2100

2200

aser

200

1960s1890s

180s

ra%ni

i'ro-ax

190s

ra#i* 200

1930s

1900s

as#arov

1990s

1940s

2000s

1980s

ar#ov

$a#a.lan'a

1950s

1880s

ilou

$raft&

ax%an

is'er

$arlsen

7iar's

-0,1 0 0,1 0,2 0,3 0,4 0,5 0,6 0,

0,45

0,42

0,35

0,34

0,34

0,31

0,31

0,30

0,29

0,29

0,28

0,28

0,28

0,2

0,260,26

0,25

0,25

0,23

0,22

0,22

0,20

0,19

0,19

0,19

0,18

0,16

0,15

0,14

0,09

0,0

0,06

0,06

0,05

0,02

0,01

0,01

-0,01

a.solute slo#es

'o%#lexit& *ifferen'e average

Graph 27: players sorted (y relative slopes

ra%ni

2600

1920s

1860s

200

2400

2300

aser

2200

1940s

.lit 200

180s

$a#a.lan'a

2500

1910s

1960s1900

as#arov

ar#ov

i'ro-ax

1890s

190s

1990s

1930s

2000s

ra#i* 200

$raft&

2100

1900s

ilou

1980s

2000

1880s

1950s

ax%an

is'er

$arlsen

7iar's

-1,00 -0,50 0,00 0,50 1,00 1,50 2,00 2,50 3,00

2,02

1,82

1,6

1,50

1,46

1,43

1,31

1,30

1,25

1,23

1,22

1,22

1,20

1,20

1,19

1,101,04

1,02

1,00

0,9

0,93

0,84

0,84

0,83

0,

0,

0,5

0,3

0,69

0,6

0,64

0,63

0,52

0,4

0,44

0,10

0,04

-0,21relative slo#es

'o%#lexit& *ifferen'e average

Graph 2#: relative slope across time periods

1860s 180s 1880s 1890s 1900s 1910s 1920s 1930s 1940s 1950s 1960s 190s 1980s 1990s 2000s

0

0,5

1

1,5

2

2,5

relative slo#es vs *e'a*es

average ogarit%i' (average) 'o%#lexit&

ogarit%i' ('o%#lexit&) *ifferen'e ogarit%i' (*ifferen'e)

*e'a*es

r e l a t i v e s l o # e


20/24

It appears that the average relative slope slowly decreases with time the rate of decrease is even bigger in the %%&'rating system. It e$hibits a completely different behaviour in the FI rating system where stronger players have bigger slopes. In other words, today#s chess players have become more calculative that they were in the past. In a sense, it is

logical after 9y 1ystem by ?im;owitsch there have been no significant brea!through in chess middlegame theory.


21/24

>ositions have become somewhat easier compared to earlier times. specially eye/catching is the low point between

*@4/*@84. &egarding FI rating, it loo!s li!e stronger players have a tendency to ma!e positions more complicated.

"etter chess engines, on the other hand, end up playing in relatively easier positions.

8.5 Influence of errors

And as a final part, here is data on the influences of errors of various magnitude. The influence of errors is calculated by

multiplying the freuency by its magnitude. "y comparing the resulting number with those of other errors, it can be

ascertained which magnitudes of errors are the biggest source of inaccurate play. 0n the graphs below each bluedatapoint mar!s the product of the magnitude of an error and freuency. The red line shows the moving average of @

datapoints. Npper graph is based on data used in this study (*+++ positions, average error 4.*64), lower graph

represents data ta!en from an earlier study5 (*8 *+8 positions, average error 4.*7).

5 http-BBwww.chessanalysis.eeBsummary854.pdf

Graph 33: complexity across $/!. rating

1900 2000 2100 2200 2300 2400 2500 2600 200

0,4

0,44

0,48

0,52

0,56

0,6

'o%#lexit& v s ;E

;E 2008

' o % # l e x i t &

Graph 34: difference across $/!. rating

1900 2000 2100 2200 2300 2400 2500 2600 200

0,2

0,24

0,28

0,32

0,36

0,4

*ifferen'e vs ;E

;E 2008

* i f f e r e n ' e

Graph 3: complexity across CC+- rating

1800 2000 2200 2400 2600 2800 3000

0,4

0,44

0,48

0,52

0,56

0,6

'o%#lexit& vs $$!

$$! 2014

' o % # l e x i t &

Graph 36: difference across CC+- rating

1800 2000 2200 2400 2600 2800 3000

0,160,10,180,19

0,20,210,220,230,240,250,26

*ifferen'e vs $$!

$$! 2014

* i f f e r e n ' e

Graph 37: influence of errors 1

0

4

8

12

16

20

24

influen'e of errors 1 (avg error 0F160)

su% of errors

%oving average of 9

avg error

e r r o r s u %


22/24

espite of the fact that the average error is significantly different in both dataset, side/by/side comparison reveals, as

shown by the red moving average line, biggest influence is roughly eual, both are around 4.4. ence the uestion-will the main source of inaccurate play also remain the same if we have a separate loo! at engines and humans with

roughy same accuracyE 0n the following graphs, the overall average error of all engine moves is 4.438. uman moveswere grouped in two, one based on players with average error higher than 4.*44 and those whose average error was

lower than 4.44. The average errors of all moves combined were 4.4+* and 4.87 respectively.

Graph 3#: influence of errors 2

0

2

4

6

8

10

12

1416

18

20

22

24

26

influen'e of errors 2 (avg error 0F132)

su% of errors

%oving average of 9

avg error

e r r o r s u %

Graph 3': /nfluence of engine errors

0

0,5

1

1,5

2

2,5

3

3,5

4

4,5

influen'e of engine errors (avg error 0F084)

su% of errors

%oving average of 9

avg error

e r r o r s u %

Graph 4%: /nfluence of human errors 1

0

0,5

1

1,5

2

2,5

3

3,5

4

4,5

influen'e of u%an errors 1 (avg error 0F01)

su%

%oving average of 9

avg error

e r r o r s u %


23/24

The results might be described as remar!able. All three graphs show that, irrespective of the accuracy of the moves, theerror influence pea! is persistently situated around 4.4 mar!. True, in the case of engines, there is a small oddity, the

pea! seems to have a little cavity centered around 4.4, with two ridges surrounding it. It is presently un!nown whatmay be the cause for that. These graphs also e$plain why it is not recommendable to use threshold/based analysis, at

least not with the threshold values of 4.*4 and above. At first glance, small to medium/si;ed errors may seeminsignificant, but what they lac! in gravity, they ma!e up for being more numerous.

5. %onclusion and future perspectives

In this study the author, using &yb!a 7, tried to measure the objective strength of play and to determine its relationship

between wither type of rating systems. "esides that the aim was to record the change of the strength of play throughtime. In could be compared to athletics world record progression or world leading mar! tables, that provide a good

overview of development in athletics. Nnli!e many sports today, chess has been played for centuries, and the level of play has been since long ago been very high. As it became clear earlier, already by the beginning of the previous

century, top players were on a par with today#s wea!er :9/s. In the light of this info, Gohn ?unn#s speculation that

ugo 1Schting at Darlsbad tournament in *@** was merely a *44/rated player, should be regarded as a seriousunderestimation. According to chessmetrics rating after the tournament, he was only 57 points short of 'as!er, which

would rate the latter ca 54 points lower than can be seen on the graph *+. ue to the fact that so far there has been noreliable method for measuring the uality of play, such a phenomenon as overhyping of players of the past has gained

ground. 1urprisingly many people seriously thin! that former great figures were at least as talented as today#s top

players, and that they played as well or even better. >sychologically completely understandable, practicallyunnecessary we all tend to see the past more beautiful than it actually was. >layers of the past are being overrated also

because out of all their games, there is a tendency to selectively highlight better specimens, whereas in the case of contemporary players, various sites providing live engine/assisted analysis display their average level. And since the

population of the world and the number of chess players bac! then were smaller, the same thing can be said about talent pool. It is more probable to find more naturally talented players in larger pool.

%omparison between %%&' and FI rating systems gave a surprising conclusion. "efore that, the author held an

opinion that both systems had an analogous relationship between strength and accuracy. It comes out that therelationships are of opposite nature. The accuracy of humans decreases logarithmically with strength of play, with

gradually diminishing rate, but the accuracy of play of engines, on the other hand, decreases at e$ponential rate. Thefact that engines from the bottom part of the rating list are wea! has been noted already long ago. 0ne conclusion that

can be made is that it is virtually impossible to reach negative ratings in engine rating systems, whereas it is very easy inthe FI rating system. The wea! point in the conclusion is that, because of the lac! of proper methods, it was not

possible to rec!on with the impact of the anti/computer strategy on results. In the future it would be necessary to devisemethods to describe and research it more closely and how it depends on the strength of engines and depth of search.

>roblematic is the relative instability of human play, which is clearly illustrated on the graph 4. It however ma!es

coclusions somewhat untrustable. Therefore increase in the number of analy;ed moves per player is recommended.

The most difficult part in such analysis wor!s is obviously practical play. There were no satisfactory outcomes

regarding that. It was found out that a phenomenon that could be called #objectivity/practicality bias# is still present inresults. Therefore players# results whose difficulty of positions was far from the average must be regarded with caution.

The more difficult positions, the bigger the probability that his result according to analysis tourns out to be underratedand, in case positions far below average, the results will be generally overrated. >reviously we saw that practical play is

Graph 41: influence of human errors 2

0

1

2

3

4

5

6

8

9

10

influen'e of u%an errors 2 (avg error 0F243)

su%

%oving average of 9

avg error

e r r o r s u %


24/24

based on differentials in the three categories- difficulty of positions, type of positions and thin!ing time. "ut there is

also another interesting, at least a theoretical way. It is based on the fact that moves can be characteri;ed not only byuality, but also by how easilyBhard they can be noticed. 1ome moves are fairly obvious, others can seem uite illogical

and at first glance wrong. The fact that there are a lot of positions where the best / often only / move is e$tremely hardto see, is one of chief reasons why chess is such a difficult and fascinating game. ere we have arrived at the problem

of measuring the obviousness of moves. 0n what basis and how to measure itE ere; in

Thessaloni!is 4*7 and Dramni! in 'ondon %lassic 4** were also remar!able performances. There e$ists a small possibility that their actual uality of play surpasses that of %arlsen. These three performances definitely deserve further

scrutiny. The closer a score is to *44= and the fewer games played, the more unreliable T>& value will be. Thus itwould be interesting to loo! into discrepancies between T>& and uality of play as a function of score and the number

of games. It is worth to pay attention to tournaments of 'as!er and %apablanca in ?ew or! in *@8 and *@+. espitethe fact that they both had almost eual chessmetrics T>&s, as shown on the graph *@, that the difference in the uality

of play, caused by the objectivity/practicality bias is 45 points. Ta!ing into account the large amount of games and therelatively small timespan between them, it is uite probable that the uality of play of both players closely corresponds

to the T>&s. For this reason comparing games of 'as!er and %apablanca from the ?ew or! tournaments would presumably be a good indicator of the trustability of methods used for analysis.

&esults also showed that FI rating since *@+4 has been inflating with respect to absolute strength, with the average

rate about 5 points per decade. It is still relatively modest which may e$plain why D.

quality of play in chess and methods for measuring

Documents