quality of play in chess and methods for measuring
TRANSCRIPT
-
8/18/2019 Quality of Play in Chess and Methods for Measuring
1/24
Quality of play in chess andmethods for measuring
Erik Varend
Tallinn, 2014
Abstract. In this study, using the computer, the subject of the research isthe absolute strength of play of various chess playing entities (humans and
computers). First of all, the actual accuracy of play will be determinedwhich is measured via the mean difference between the move suggested by
the engine and the move actually made. Thereafter, individually for eachentity, factors that have an effect on the accuracy of play will be
determined, and an estimated accuracy of play will be found based on those
factors. It shows the accuracy if all factors were the same for all players.As a result, it was determined and proven that there is a relationship between rating and the uality of play. In addition, it was also proven that
the further one goes bac! in time, the more the uality of play decreases."y comparing the accuracy of play in both humans# and engines# play it was
determined to what e$tent %%&' and FI ratings correlate. The author also drew several miscellaneous conclusions based on the collected data.
*. Introduction
The primary aim of this study is to find a correlation between the strength of play (either FI and %%&') and the
accuracy of play. Also + most noteworthy performances in the history of chess are under comparison, and how thestrength of chessplayers has changed over time. "esides, the final section of the paper contains various other conclusions that can be drawn from the data collected.
There are different ways to estimate and compare performances-
* by measuring absolute strength
by measuring relative strength.
Absolute strength can be defined by how far away a performance is compared to the perfect performance, i. e. thedistance between the actual performance and the best performance possible. The closer a performance stands to the
absolute perfection, the better it is. In case of relative strength a performance is compared to results of other performers,and the actual strength has no importance at all. In circumstances where ascertaining the absolute strength has not been
easily feasible, there has normally been no choice but to use relative strength measurement as a yardstic!. The latter is prevalent in case of one/on/one sports such as snoo!er, tennis and chess where '0 rating is used to compare the
strength of players. %omparisons of players and performances from different epochs is only possible using absolutestrength. It is, for e$ample, impossible to say who was stronger, 'as!er or 1pass!y if using their chessmetrics ratings.
%onseuently, we need to find an indicator of absolute strength in chess. There is a variety of ways to do this which can
be split into two primary types-
* tablebases various computer/based estimations
%ertainly, most preferred would be tablebases, because they give perfect solutions for each position. In that cas the
accuracy of play would be measured in the mean number of transitions per move. A transition is a change in the state of
the game 2 won, drawn, or lost 2 assuming perfect play from both sides. A change from a drawn position to a lost one,or from a drawn one to a lost one euals to * transition. If a won position becomes a lost position, it is transitions. The
fewer transitions per move, the higher the uality of play. Four piece tablebases were completed by the end of 34/s. 5/
piece T"s were compiled in early nineties, those with 6 pieces in 445, and now we have +/piece tablebases. This
-
8/18/2019 Quality of Play in Chess and Methods for Measuring
2/24
implies that it#s uite hopeless to see the complete 7/piece tablebases in near future. That#s the reason why chess
engines are necessary. There are many ways to describe the absolute accuracy with the help of the chess engines-
* The average difference between the best move suggested by the engine and the move actually made *.* difference e$pressed in centipawns
*. difference e$pressed in percentages The average change in evaluation after the move made by the player
7 The percentage of moves that coincide with those suggested by the engine8 The percentage of moves where the error e$ceeds a predetermined threshold
The version *.* can be called the classical method, since it was used by 1lovenian researchers I. "rat!o and 9. :uid in
their groundbrea!ing study.* The magnitude of an error is essentially the centipawn gap between the evaluation of amove suggested by the engine and a move actually made. 1maller differences indicate more accurate play.
Another promising possibility is to use percentages instead of centipawns, i. e. similar to 9onte/%arlo method. The percentage indicates white#s scores against blac! after a move. To find the score, a computer is set to run a certain
number of games against oneself. "etter scores would represent moves that are more preferable. The downside is the
fact that it ta!es a lot of time to get a statistically valid number of moves, especially ta!ing into account the need for ensuring that the engine has enough time per move. 0therwise its useflness in more complicated positions becomes
uestionable due to the hori;on effect. Its advantage primarily lies in theoretically drawn endgame positions whereevaluation/based estimations are !nown to be unreliable. ortuguese scientist . &. Ferreira has wor!ed out an interesting alternative solution, where what matters is not thegap to the best move at the same position, but between evaluations of best moves before and after a move has been
made by a player. 'i!e the classical method, Ferreira#s method can be used with percentages.
These tables below display differences in the classical and Ferreira methods in the cases of centipawns and percentages.
move evaluation gap Move afterNe5
evaluation change
*7.?e5 4.78 *7...h6 4.4* /4.77
*7.&c* 4.7* *7..."c8 4.48
*7.e8 4.45 4.@ *7...d$c8 *.47
*7.?c /4.* *7...f5 *.78
*7.a7 /4.*8 *7..."$e5 *.34
move percentage gap Move after Rc1
percentage change
*7.&c* 55= *7...h6 5= /7=
*7.?e5 54= *7..."c8 55=
*7.e8 54= 5= *7...d$c8 @7=
*7.?c 84= *7..."$e5 @+=
*7.a7 7@= *7...f5 @@=
The ways 7 and 8 are clearly inferior.The fact that a move made by a player coincides with that of made by computer may in most cases indeed indicate a
good move.
-
8/18/2019 Quality of Play in Chess and Methods for Measuring
3/24
impact as one error of 4.34. The principal problem of the both methods lies in the fact that they are too coarse and do
not describe the position of moves on the uality spectrum.
owever, deterimining the absolute accuracy of play alone is not sufficient. A performance never happens in a vacuum,isolated from all factors acting upon it. The level of a performance can only be manifested by the co/influence of the
two factors-
* potential conditions
Potential is the ability of a player to e$hibit an as high standard of performance as possible. It depends on a variety of
characteristics that differ for each sports. For instance, physical sports reuire good physiue, stamina and technicals!ills. In mental spords, such as chess and go, the reuired characteristics would include short/term memory, calculation
speed, intuition etc.
Conditions refers to a set of factors upon which the accuracy of play depends-
* difficulty of positions thin!ing time
7 practical play 8 psychology
5 conditions in the venue 6 health
+ level of fatigue
The first three ones are the most important.In some positions it is easier to find a good move, whereas in other positions it is more difficult. That#s what the term
#difficulty of positions# refers to. There are many ways a position can be difficult, it cannot be described by a singlefactor alone. It consists of many aspects, for e$ample, it may be chaotic and complicated, or there are relatively few
good moves in a position, or good moves appear illogical at first sight etc. Also, difficulty is individual and variesamong different players- what for one player is difficult, may be easier for another. %omputers generally are able to find
illogical moves with greater certaincy than humans.
Thin!ing time is just a time control games are played under. 0ver time rate of play has gradually gotten increasinglyshorter.
The notion #practical play# refers to the phenomenon where a player intentionally sacrifices the accuracy of play to ma!ematters more difficult for the opponent. The goal is to create such a situation where he would have to ma!e comparably
more effort to maintain the same level of accuracy. There are 7 !inds of situations that can be perused in practical play.
* difficulty of positions
suitability of the type of positions 7 thin!ing time.
1uitability of the type of positions indicates how much a certain type of positions suits a player and his nature whether
he is familiar with such type of positions, whether a given position needs more of calculating, !nowledge, intuiton etc.
In the start position and usually at the beginning phase of the game all the three factors are even for either player. Theaim is to introduce imbalances into the game situation, in favour of the first player itself, so as to the opponent has moredifficult positions which also are less suitable for him. If the opponent is in time trouble, then moving faster so that he
has less pondering time.
>sychology plays an important role in chess. A chess player must have willingness to endure competitive stress. It isimportant that he has ability to remain calm in critical moments. 1ometimes it happens that a chess player allows
himself to be disturbed by psychological factors, such as problems in private life, concerns over homeland or relativesand friends, that can affect concentrating on the game, and hinder going all out. The third type of psychological factors
is directly connected to chess whether incompatibility with the style of an opponent, fear, or a feeling of uneasiness
with him. >robably the most famous e$ample is 1hirov#s lifetime score against Dasparov 2 +- with no wins, which isfar more than one could e$pect from their ratings. Also, one may have gotten used to the style of a particular chess
player to the e$tent that une$pected sudden changes in his play may confuse. Among these are cases where a player,
who usually has preferred correct and objective play, suddenly sacrifices material. "elieve him or notE%onseuently, psychological factors can be bro!en in three main types-
* factors arising from player#s characteristics
-
8/18/2019 Quality of Play in Chess and Methods for Measuring
4/24
-
8/18/2019 Quality of Play in Chess and Methods for Measuring
5/24
today#s rating chessmetrics rating 654 corresponds to.
To find a correlation between modern rating and the accuracy of play, @ cohorts at each *44 elo were analy;ed in the
range *@44/+44. In each cohort the rating range was J$/5 $K5L, where $ signifies the goal rating of a particular cohort. The lowest number of moves was 844.
And to find out how strongly engines play, 5 different chess engines from %%&' 84B84 rating list7. Ta!en into
consideration were at least 54 moves by the following engines- iarcs *.* (@*), %rafty 7.4 (674), >hilou .3.4(76+), , with 65 seconds per move. The chess interfacewas Arena . The hardware was Intel i+ 364 M .34 :h;.
0nly moves made in more/or/less even positions should generally be considered. If moves suggested by the engine andthose actually made on the board were both outside the range J.44 /.44L and with the same sign, then a position was
considered as decisive, and moves were discarded.
As a novelty, left out of consideration were moves that are very obvious. A move is considered as being too easy to spot
if it meets the two criteria below starting from the first ply-
• a move suggested by the engine remains the same• the gap between the two best moves is always *.44 or larger.
0ne must have in mind that there is a boundary above which the magnitude of errors is irrelevant. For e$ample, if a
player ma!es a move after which the evaluation drops from *.7 to /7.48, and another move with the evaluation drop of
from 4.3@ to /**.47, then there#s no basis for assuming that the former is objectively better than the latter.
-
8/18/2019 Quality of Play in Chess and Methods for Measuring
6/24
ach blue dot represents a position. The red line shows the linear correlation between evaluation and error, also called aslope. Nnli!e the factors of difficulty, the relationship between accuracy and evaluation does not depend on player#s
nature of play. Therefore, if the evaluation were the same for all players, the e$cpected error would have to be derivedaccording to the same formula. "ut here a new problem arises- the average error varies among players, affecting the
slope of the linear relationship. A slope indicates the degree of error change in relation to evaluation changes. To findthe relationship between the slope and the average error, *4 randomly pic!ed selections with *544 positions in each
were selected. The graph below shows the result that can be ta!en as a basis.
For e$ample, if a player has the average error of 4.*4, then, according to the formula, his slope would be*.5*O4.*K4.4 P 4.4. Increasing of the average evaluation by 4.* would cause the player#s average error to be inflated
by 4.O4.* P 4.4.
. ifficulty of positions
This research uses different factors of difficuly-
• the difference between the best and the second best moves, e$pressed in centipawn units• comple$ity
The first one is self/e$planatory. The latter one needs some e$plaining. The manner of calculating comple$ity is ta!enfrom the wor! of "rat!o and :uid. very time the engine proposes a new Q* move, the gap between the best and and
the second best moves is recorded, and at the end all these are summed together. ere it is presented in the form of original program code.
complexity := 0
FOR (depth 2 to 12)
IF (depth > 2) {
IF (previous_est_move !O" #$%&' curret_est_move) {
complexity = *est_move_ev+lu+tio
, secod_est_move_ev+lu+tio*
-
Graph 1: evaluation vs average error
0,0 0,2 0,4 0,6 0,8 1,0 1,2 1,4 1,6
0,00
0,06
0,12
0,18
0,24
0,30
0,36
0,42
0,48
0,54
f(x ) = 0,29x - 0 Evaluation vs avg error
evaluation
e r r o
r
Graph 2: eval vs avg error slope depending on the average error
0,060 0,080 0,100 0,120 0,140 0,160 0,180 0,200 0,220 0,240 0,260
0
0,050,1
0,15
0,2
0,25
0,3
0,35
0,4
0,45
f(x) = 1,51x 0,02
!" = 0,82
avg error vs eval slo#e
avg error
e v a l s l o # e
-
8/18/2019 Quality of Play in Chess and Methods for Measuring
7/24
-
previous_est_move := curret_est_move
-
In this study a modified version is used. Through depths *4/*5 plies all values are doubled to assign them more
importance. It#s always harder to see any changes in greater depths and indicates a more complicated position."elow is a comparative e$ample how computing comple$ity scores is carried out in both ways. ighlighted with yellow
are cases where the best move changes. The sum at the lowest row shows the degree of comple$ity.
move evaluation difference depth move evaluation difference
?c6 /4.7 4.44 ?c6 /4.7 4.44
?f6 4.58 4.47 7 ?f6 4.58 4.47
?f6 /4.* 4.44 8 ?f6 /4,* 4.44
d6 4.85 4.*6 5 d6 4.85 4.*6
?c6 /4.*4 4.44 6 ?c6 /4.*4 4.44
?c6 4.74 4.44 + ?c6 4.74 4.44
?f6 /4.47 4.4 3 ?f6 /4.47 4.4
?f6 4.@ 4.44 @ ?f6 4.@ 4.44
?f6 4.* 4.44 *4 ?f6 4.* 4.44
?f6 4.*@ 4.44 ** ?f6 4.*@ 4.44
?f6 4.45 4.44 * ?f6 4.45 4.44
e5 4.@ 4.*+ *7 e5 4.@ 4.*+ ($)
e5 4.* 4.48 *8 e5 4.* 4.48
e5 4.6 4.*8 *5 e5 4.6 4.*8
sum 0.38 sum 0.55
The two graphs below illustrate how both factors of difficulty influence the accuracy of play. All analy;ed positions are
included.
Influence of a factor of difficulty on a player is individual, and is dependent on one#s nature of play. 1ome players arerelatively more susceptible to changing difficulty of positions. Their accuracy becomes worse faster than other players
with increasing difficulty. As the difficulty of positions cannot be described by one parameter only, it remains possiblethat different factors have different effect on a player. For e$ample, in the instance of two eually strong players, one of
them may have a lower than average tolerance for the factor represented by #comple$ity# in this study, and a higher thanaverage tolerance for #difference# but completely the other way around for another one.
.7 Thin!ing time
There is plenty of information in the Internet about various time controls that have been used in various eventsthroughout history. Nnfortunately it is not always possible to find any information about in a certain event. In such cases
the following principle was applied- *334 / *@5 8 min *@6 / *@85 7 min 4 s *@86 / *@35 7 min 85 s *@36 / ... 7min per move.
D.
-
8/18/2019 Quality of Play in Chess and Methods for Measuring
8/24
K *4##Bmove / 744 elo.8 According to that, the double difference in thin!ing time is eual to ** elo, and the relationship
between them is logarithmic. For engines the difference is worth 66 elo.
The biggest concern in games of earlier times is adjourned games. There#s no doubt that a possibility to analy;e games
either alone or with assistants greatly helps the accuracy of moves played after resuming the game. It would benecessary to !now how long those sessions lasted before resuming the play, and whether analy;ing was allowed. As in
the case of time controls, information is rather scarce. In the absence of reliable information, * hour was added to timecontrol of each game that underwent adjournements as a compensation.
1ometimes the remaining number of moves after 84 thB64th move has not been specified in time control information,
e$cept the number of minutes. In such cases the remaining time amount was divided by the number of moves actually played. If it e$ceeds time per move specified in the first part of time control, then the average thin!ing time in a given
phase of the game is considered the same as in the preceding phase.
.8 >ractical play
0f the three possible manifestations of practical play, only the difficulty of positions is loo!ed at here. Ideally, it wouldhave been preferable to use the suitability of position types and thin!ing time as well. "ut in the first case it would have
been necessary to devise a way to uantify the suitability of the types of positions for players. In the latter case the!nowledge on the precise amount of time spent on thin!ing on each move would have been needed. "oth areunreali;able at the current juncture. The method in itself is simple 2 measure and compare the difficulty of positions for
either side of the board. If one side has positions that are easier to play, it may be assumed that its results are better thanits accuracy of play would suggest. The effect of difficulty difference between either player depends on two factors-
a) degree of the difference between the difficulty of positions
b) sensitivity of a player#s accuracy of play to difficulty
:enerally there#s no data available on the tolerance of particular players with respect to changing difficulty level. Insuch cases it#s possible to use generali;ed sensitivity to both factors of difficulty of positions and which is dependent on
the average error. In order to find this, first we ta!e the average rating of all opponents and loo! up its euivalentaverage e$pected error in the error2rating table. 1econdly, we determine the relationship between average error and
slopes for both types of difficulty, as shown in the graphs below. 1imilary to the graph , each data point represents arandomly/selected dataset of *544 positions.
8 http-BBwww.chessgames.comBperlBchess.plEtidP34@34H!pageP4Qreply575
Graph 6: complexity slope depending on avg error
0,060 0,080 0,100 0,120 0,140 0,160 0,180 0,200 0,220 0,240 0,260
0
0,05
0,1
0,15
0,2
0,25
0,3
0,35
f(x) = 0,61 x+0,93'o%#lexit& an* s lo#e
avg error
s l o # e
Graph 7: difference slope depending on avg error
0,060 0,080 0,100 0,120 0,140 0,160 0,180 0,200 0,220 0,240 0,260
0,00
0,10
0,20
0,30
0,40
0,50
0,60
f(x) = 0,24 ln(x) 0,9*ifferen'e an* slo#e
avg error
s l o # e
Graph : !ependence of performance on thin"ing time
0,00 0,20 0,40 0,60 0,80 1,00 1,20 1,40 1,60 1,80 2,00
-600
-525
-450
-35
-300
-225
-150
-5
0
5
150Tining ti%e vs elo
ti%e 'oeffi'ient
e l o # e r f o r % a n
' e
-
8/18/2019 Quality of Play in Chess and Methods for Measuring
9/24
1lope increases with average error. ence, if our opponent had an average e$pected error of 4.45, its comple$ity vs
average error slope would be 4.6*O4.454.@7P4.*8$ and difference vs average error slope 4.8Oln(4.45)[email protected]*$.It#s not necessary to include practical play if both sides of games are ta!en into analysis. In that case differences in
difficulty, suitability, thin!ing time etc would cancel each other out. If a game for one player is on average moredifficult by $ hypothetical units, then his opponent has, at the same time, the game easier by /$ units, and the sum would
always be ;ero.For this reason, practical play has only been included in the analysis of the games of the + most remar!able
performances in the history of chess. As for the rest of games, both sides have been ta!en into account.
.5 Finding the strength of play
aving determined the absolute accuracy of play and the aforementioned factors having effect on it, it becomes possible
to derive the e$pected error of players. It consists of the following steps-
*. Find the average e$pected error of players.. stablish a relation between a modern rating and the e$pected error.
7. Find the modern rating euivalent of the e$pected error.8. Find out rating lossesBgains due to time control an practical play.
As a result we will get a supposed today#s rating corresponding to the strength of play. Nnfortunately, one must be
satisfied with the fact that full confidence can never be attained. 9ethods described here are by no means *44=reliable, as it#s still in its infancy and chess engines of today have limited abilities.
The e$pected error indicates a player#s hypothetical accuracy of play (average error), if the difficulty of positions andevaluation were e$actly the same for all players. In this study the average comple$ity of all moves valid for comparison
2 4.53 and difference 2 4.57 were used to represent a common ground. The graph below showing how the accuracy
of %apablanca and Dramni! changes as a function of comple$ity also depicts the manner the e$pected error isdetermined with the help of linear trend lines.
As we can see, %apablanca#s e$pected error by comple$ity is 4.4+4, and that of Dramni! is 4.438. Dramni!#s positions
were a little more complicated and those of %apablanca was far lower than the average comple$ity of all positions,
therefore the gap between their accuracies of play would be smaller if they both had positions of the same comple$ity.The e$pected error according to the difference is found by the same method. 0ne can also note that %apablanca#s
accuracy of play has been less dependent on difficulty than Dramni!.
Graph #: $inding expected error
0,00 0,05 0,10 0,15 0,20 0,25 0,30 0,35 0,40 0,45 0,50 0,55 0,60 0,65 0,0 0,5 0,80 0,85 0,90 0,95 1,00
0,00
0,02
0,04
0,06
0,08
0,10
0,12
0,14
0,16
0,18
0,20
f(x) = 0,15x 0,01
!" = 0,11
f(x) = 0,06x 0,04
!" = 0,03
$a#a.lan'a /e or 192
ra%ni vs as#arov %at' 2000
'o%#lexit&
e r r o r
averagecomple$ity of%apablanca#s positions
average
comple$ity of!ramni!#s
positions
averagecomple$ity ofall positions
actual averageerror of
%apablanca#smoves
actual average
error ofDramni!#s
moves
e$pected errorof both player#s moves
change inDramni!#saccuracy dueto changingdifficulty
change in%apablanca#saccuracy dueto changingdifficulty
-
8/18/2019 Quality of Play in Chess and Methods for Measuring
10/24
7. &esults
The following section is divided in two parts. First all necessary data on all analy;ed chess/playing entities will be dealt
with, and then, step/by/step based on that, we#ll find the hypothetical strength of play of each player.The most important of these is, of course, the actual accuracy of play i.e. the average error. The result of a game only
depends on differences in the accuracy of play. owever, it must be born in mind that it never directly shows the levelof chess s!ills, but rather remains biased towards players with more positional style and longer time controls. The
following graph displays all chess/playing entities sorted by average error.
$pectedly, most engines occupy top spots.
-
8/18/2019 Quality of Play in Chess and Methods for Measuring
11/24
It stands out that %apablanca had positions with by far least difficulty. There has been a lot of mentioning on Fischer#s
simple style of play, his tendency to avoid complications. Indeed, according to the graph **, the comple$ity of his
positions were below average in games against 'arsen and Taimanov. owever, it can be seen on the gaph *4 that theaverage difference between two best moves in Fischer#s positions is above average. The fact that a position seems
somewhat easy to us does not automatically mean it would be easy to find accurate moves. It is perhaps not uitesurprising that correspondence games from chessgames.com have the lowest average evaluation, i. e. in those games the
positions were eual longer due to higher uality of play.
The graph above shows the average e$pected error which is derived by ta!ing the average of both e$pected errors bycomple$ity and difference and includes changes in the average error due to the evaluation. The results are more logical,
compared to what was dispalyed on the graph @. %orrespondence games are left out, as there is no point in measuring
changes in the accuracy of play, if its estimation cannot be trusted. As a rule in all !inds of measurements, the gauge
Graph 12: &verage evaluation of positions
$$ga%es$raft&
1910s2600
7iar's1940s180s1860s1920s
$a#a.lan'a190s1990silou
i'ro-axar#ov1900s1960s
ra%nira#i* 200
as#arov2200
1930s1880saser 2000s
24001900
is'er $arlsen
1980s.lit 200
1950s1890s
210025002000
ax%an2002300
0,000 0,200 0,400 0,600 0,800 1,000
0,18 0,356
0,380,3900,399
0,4200,4250,4250,4280,4380,4450,4480,452
0,410,5020,5040,5040,5100,5190,5260,5320,530,5430,558
0,540,5850,585
0,6120,615
0,6420,6500,6620,6630,610,69
0,100,84
0,8030,881
evaluation
Graph 13: &verage expected error
.lit 200
1900
2000
2200
1860s
2100
1920s
1890s
2500
1910s
2300
2600
180s
2400
1900s
1880s
ra#i* 200
1960s
i'ro-ax
1950s
1990s
aser
1940s
2000s
1930s
1980s
190s
ax%an
200
as#arov
ilou
ra%ni
$a#a.lan'a
ar#ov
$arlsen
$raft&
is'er
7iar's
0,000 0,050 0,100 0,150 0,200 0,250 0,300 0,350
0,293
0,258
0,243
0,229
0,22
0,22
0,211
0,206
0,200
0,193
0,193
0,14
0,168
0,16
0,16
0,166
0,161
0,156
0,155
0,153
0,151
0,142
0,138
0,134
0,132
0,129
0,11
0,116
0,112
0,092
0,091
0,091
0,03
0,02
0,065
0,064
0,054
0,052
:verage ex#e'te* error
-
8/18/2019 Quality of Play in Chess and Methods for Measuring
12/24
must be of higher uality or trustworthiness than things being measured. The methods used in this paper are simply not
adeuate enough for modern software/assisted correspondence games.
The ne$t step is to ta!e data from the previous graph to find the relationship between the rating and the uality of play.
The relationship appears to be logarithmic. The blac! line depicts the appro$imate boundary of trustability below which
engine output cannot be trusted. It is interesting to note that it crosses the trend line at @7* '0, which may indicatethat the level of play of the combination of the engine, hardware and time used here is eual to @7* FI 443. "ut
that is naturally a speculation which needs further research.
>layers ran!ed according to thin!ing time-
The farther bac! in time, the longer time controls are.
The ne$t steps represent an attempt to factor in at least a fraction of generally unfathomable and messy notion called
practical play.
Graph 14: )he relationship (et*een accuracy and fide rating 2%%#
3 0
0 0
2 9
0 0
2 8
0 0
2 (
0 0
2 6
0 0
2 5
0 0
2 4
0 0
2 3
0 0
2 2
0 0
2 1
0 0
2 0
0 0
1 9
0 0
1 8
0 0
1 (
0 0
1 6
0 0
0
0,02
0,04
0,06
0,08
0,1
0,12
0,14
0,16
0,18
0,2
0,22
0,24
0,26
0,28
0,3
f(x) = 0,11 ln(x) - 0,03
!" = 0,86
Te a''ura'& of #la& an* ;E rating 2008
E< rating
e x a v e r a g e
# e ' t e *
e r r o r
Graph 1: )hin"ing time
.lit 200
ra#i* 200
ax%an
$raft&
7iar's
ilou
i'ro-ax
$arlsen
1900
2300
2000
2500
2000s
2600
21002400
2200
200
1990s
ar#ov
as#arov
ra%ni
1930s
1980s
1940s
1950s
190s
1910s
$a#a.lan'a
is'er
1960s
180s
1890s
1880s
1900s
aser
1920s
1860s
0 50 100 150 200 250 300 350 400
5
24
60
60
60
60
60
134
143
152
153
156
158
160
161164
165
10
180
180
180
180
200
203
21
225
225
240
244
249
250
259
264
266
26
280
289
341
Ti%e 'ontrol
"oundary of trustabilityP 4,47*
-
8/18/2019 Quality of Play in Chess and Methods for Measuring
13/24
>layers ran!ed according to relative difference of positions-
?egative value shows that opponents had positions easier, in the case of positive one it is the other way around. As onecould have e$pected, Dasparov and 'as!er, players !nown for practical play, are situated on top. 1omewhat surpisingly,
it appears that even %apablanca too had both difficulty factors easier than his opponents. 0ne of reasons could be thatthe easier one#s positions, the greater the probability that the opponent#s positions are more difficult, despite the degree
practicality in one#s play. Dramni!#s positions were e$pectedly easier than Dasparov#s in the title match 444.
"efore trying to find out how much e$actly a difficulty differential influences opponent#s play, it is necessary to !nowthe strength of opponents. First we loo! up FI or chessmetrics rating of that time and translate it into contemporary
rating euvalent.
The blue line represents actual data based on the analysis of randomly pic!ed games (rating range 644/+44) from
each decade. The red line represents top/rated players# strength of play. The gap in each decade between a top/rated player and a 654/rated player is based on the arithmetical averages of january lists in the same decade. It can be seen
that if the logarithmic trend line can be trusted, the first time top players reached an F9 level (744 '0) already inmid/*@th century. The level of an International 9aster (844/544) was achieved in *334/*3@4s. :9 level was reached
during the first decades of the RR century. evelopment was relatively uic!/paced at that time. Top players were ontoday#s 1uper :9 level already in the 84/ies. arsen 191
$a#a.lan'a /e or 1924
ar#ov inares 1994
aser /e or 192
as#arov inares 1999
-0,100 -0,050 0,000 0,050 0,100
!elative *iffi'ult& of #ositions
'o%#lexit& *iff eren'e
Graph 17: evolution of chess strength (y decades
1830s 1840s 1850s 1860s 180s 1880s 1890s 1900s 1910s 1920s 1930s 1940s 1950s 1960s 190s 1980s 1990s 2000s 2010s 2020s 2030s
1800
1900
2000
2100
2200
2300
2400
2500
2600
200
2800
2900
3000
!ise of 'ess sills over ti%e
elo 'ess%etr i's 2650 ogar it%i' (elo 'ess%etr i's 2650)
elo 'ess%etri's igest ogarit%i' (elo 'ess%etri's igest)
*e'a*es
5 ; ) E 2 0 0 8
-
8/18/2019 Quality of Play in Chess and Methods for Measuring
14/24
The graph below demonstrates the opponent ratings and their actual strengths.
?ot surprisingly, Dramni! had the strongest opponent against Dasparov in 444. The wea!est opposition was against%apablanca in ?ew or! *@8. "y translating those ratings into average e$pected error and ta!ing into account how
generali;ed sensitivity, as described in the section .8., to either type of difficulty depends on it, it can be ascertainedhow much difficulty differentials affect performance.
The ne$t graph shows the final conclusion of this wor!. The blue bars indicate ratings directly derived from the
e$pected error, as shown on the graph *8. arsen 191
ar#ov inares 1994
as#arov inares 1999
$arlsen /aning 2009
ra%ni vs as#arov 2000
2200 2300 2400 2500 2600 200 2800 2900 3000
246
2533
2690
224
263
285
288
-
8/18/2019 Quality of Play in Chess and Methods for Measuring
15/24
The winners according to this criterion are iarcs *.* and %arlsen in ?anjing 44@. It may seem surprising that his
actual accuracy of play in that tournament was almost *44 points lower than his official T>& (744*). "ut it can bee$plained by two facts- against &s. It is hardly a surprise that at the bottom of the
graph are situated those who should be there 2 *@44/rated players and blit; games of +44/rated players.
8. 9iscellaneousIn this section several additional interesting conclusions that the collected data offers will be provided.
8.* %hessmetrics 7/year pea! top 54
The table below compares 7/year pea! ratings for each player ta!en from the chessmetrics.com site and their FI
euivalents in 443. The year indicates the middle year of the three/year periods.
According to that table, the strongest level of play of all times was performed by Dasparov during *@3@/*@@* where his
play supposedly would have been rated circa 364 in 443. et, it must be ta!en into account that this table is a bit
name year chessmetrics FIDE 2008
1 as#arov 1990 284 2861
2 is'er 192 286 281
$a#a.lan'a 1920 285 2664
! aser 1895 2855 2562
" ?otvinni 1946 2852 238# :leine 1931 2841 2684
$ ar#ov 1989 2833 2819
8 :nan* 1998 2822 2825
% ra%ni 2001 2815 2824
10ills.ur& 1901 2806 2540
11aro'& 1906 299 2554
12or'noi 199 298 263
1Tarras' 1895 296 2503
1! ;van'u 1992 294 285
1"@teinit 1885 294 2451
1#@%&slov 1955 293 203
1$etrosian 1962 289 216
18Tal 1960 286 208
1%!u.instein 1912 281 2559
20!esevs& 1953 26 268121/a*orf 194 25 2664
22Auertort 1884 24 2425
2eres 1956 23 2685
2!/i%oits' 1929 20 260
2"?ronstein 1951 20 2669
2#@#ass& 190 26 212
2$a%s& 1995 265 262
28$igorin 1896 263 245
2%arsall 191 259 2556
0eo 2001 25 266
1Banos& 1904 25 2504
2ine 1940 256 2626
To#alov 199 254 255
!@alov 1994 254 249
"Celfan* 1992 254 245#@irov 2000 253 260
$?ogolu.o 192 253 2583
8Celler 1963 252 2681
%oroevi' 2000 251 258
!0Eue 1936 250 2608
!1 :*a%s 2001 249 258
!2olugaevs& 19 248 209
!?eliavs& 1988 24 231
!!Ti%%an 1989 24 233
!"@'le'ter 1911 24 2522
!#ortis' 1980 246 213
!$@tein 1966 245 2681
!8Daganian 1985 244 221
!%Bussu#o 198 244 225
"0arsen 190 244 2689
-
8/18/2019 Quality of Play in Chess and Methods for Measuring
16/24
misleading. It is often so that the chessmetrics rating of a player is a decade or more later only a few points below his
pea!, but nevertheless his play is better due to general rise in chess s!ills. For e$ample, 'as!er#s chessmetrics rating in*3@8 was 3+3, but in *@*+ it was 364, whose 443 euivalent would have been ca 654. Dasparov#s rating in *@@@
was 338, merely p lower than his best he achieved in *@@7, but there is no doubt that his actual uality of play hadimproved by that time.
8. %omparison of human and engine ratings
In this wor! there is enough data on both humans and engines for ma!ing interesting comparisons between so differenttypes of players. "elow is a side by side comparison of the relationships between the accuracy of play and ratings of
either type. %%&' ratings are given as of *+.*4.4*8. According to the site, the time control was chosen in such a wayas to be euivalent of 84 moves per 84 minutes on Athlon 68 R 8644K (.8 :h;).
As we can observe, the trend lines are of opposite nature. The relationship between the accuracy of play of human chess players and the rating is logarithmic on lower levels, the accuracy gaps are smaller than at the top. 0n the other hand,
in the case of engines, it is completely opposite 2 e$ponential. It should be noticed how closely the trend line followsthe actual line representing the accuracy of play there is a star! contrast. It confirms what was !nown for long 2
computers# play is far more stable.
"ased on data on those two graphs, it is possible to compile conversion tables for finding one/on/one correspondences
between both rating systems.
Graph 2%: $/!. rating vs accuracy
3 0 0 0
2 8 0 0
2 6 0 0
2 4 0 0
2 2 0 0
2 0 0 0
1 8 0 0
1 6 0 0
0
0,04
0,08
0,12
0,16
0,2
0,24
0,28
Te a''ura'& of #la& an* ;E rating 2008
E< rating
e x # e ' t e * e r r o r
Graph 21: CC+- rating vs accuracy
1 5 0 0
1 ( 0 0
1 9 0 0
2 1 0 0
2 3 0 0
2 5 0 0
2 ( 0 0
2 9 0 0
3 1 0 0
3 3 0 0
0
0,04
0,08
0,12
0,16
0,2
0,24
0,28
Te a''ura'& of #la& an* $$! rating 2014
$$! 4040 rating
e x # e ' t e *
e r r o r
&&R' !0(!0 FIDE 2008 !0(%0)0
3400 2926
3300 2921
3200 2916
3100 2911
3000 2904
2900 289
2800 2888
200 28
2600 2864
2500 2849
2400 2830
2300 2805
2200 24
2100 234
2000 269
1900 26031800 2493
100 2325
1600 2054
FIDE 2008 !0(%0)0 &&R' !0(!0
2900 2941
2850 250
2800 2281
250 2136
200 2034
2650 195
2600 1896
2550 184
2500 1805
2450 10
2400 139
2350 112
2300 1688
2250 1662200 164
2150 1630
2100 1614
2050 1599
2000 1586
Graph 22: human and engine rating comparison
2000
2100
2200
2300
2400
2500
2600
200
2800
2900
3000
relationsi# .eteen ;E an* $$! ratings
$$! 2014 4040
5 ; ) E
2 0 0 8 4 0 9 0 3 0
Graph 23: engine and human rating comparison
1300
1500
100
1900
2100
2300
2500
200
2900
3100
relationsi# .eteen $$! an* ;E ratings
;E 2008 409030
$ $ ! 2 0 1 4 4 0 4 0
-
8/18/2019 Quality of Play in Chess and Methods for Measuring
17/24
At first sight, it may seem surprising that the best chess engines are, according to this, so wea! compared to humans.
"ut, it must be ta!en into consideration that %%&' games are run on a uite wea! hardware and the rate of play isnearly 7$ uic!er than the standard FI time control. It can be concluded from the graph that in the beginning it
was uite easy for engines to ma!e progress against humans, but with time it is getting increasingly harder. ?ote-comparisons were made on the assumption that humans play against engines as they would against other humans i. e.
not using any anti/computer strategies. Nnfortunately there is not yet a reliable way to emulate anti/computer play andits effects.
The reasons why the relationships between the accuracy and strength of play are e$actly li!e that, are un!nown to the
author. 0ne of feasible reasons could be that the nature of the curve is related to the relative importance of calculation/evaluation. The larger the relative importance of calculation in the move/choosing process (engines), the steeper the
e$ponential curve is while gaining in rating points, there is an ever/decreasing rate in the accuracy gain. "ut, if a player has a larger relative importance of evaluation (humans), then there is a contrary phenomenon- rating growth means a
faster increase in the accuracy. If it is true, then there presumably must be such a hypothetical mutual relationship
between calculation and evaluation where the accuracy vs rating relationship is linear.
8.7 &ating inflation and deflation
-
8/18/2019 Quality of Play in Chess and Methods for Measuring
18/24
0n the first graph we can see that the gap has decreased from 64 points in *@+4 to 7+ points in 4*8 5 points per decade. %learly recogni;able are two #mountains# and four #valleys#. The valleys refer to periods where rating numbers
were uite high because of a dominant player 2 Fischer, Dasparov (two periods), and %arlsen. The mountains mar! periods with euality and no clear dominator. 0n the other graph there is a completely different situation. The ratings of
*st players of the rating lists have been relatively stable since *3@4s, while the s!ill level has steadily been rising. Inother words 2 what we see there is the deflation in the chessmetrics rating system. The rate of deflation has decreased
somewhat since *@64s, which is logical, as the rate of improvement of playing s!ills must slac!en over time. ere too,#valleys# from domination periods can be seen. The rate of deflation is 643 points wihin *387/448 2 7.+3 points per
year.
8.8 arious trends
>reviously we loo!ed at how the strength of play had changed across the history and the two distinct rating systems.
owever, the same can be applied to changes in other factors, such as slope and both factors of difficulty. "efore havinga closer loo! at those, a short introduction on the notion of #slope# and what it actually indicates will be presented below.
As persons more familiar with chess !now, chess players can be split into two groups based on the nature of play-
* positional, where intuition and !nowledge prevail
tactical, where the speed of calculations, precision and creativity are most important
Nsually it is !nown that nature of play dictates the choice of openings and the type of positions, but differences are also present in players# tolerances with respect to the difficulty of positions and thin!ing time. If one tries to solve a problem
by calculating variations and possible outcomes, then it generally ta!es a lot of time before a solution is reached. 0n the
other hand, it is universal- calculation is suitable for solving any type and however difficult problems. The advantage of problem solutions based on !nowledge or intuition is speed. It ta!es almost no time to recall facts in memory or reali;e
something via intuition. Their disadvantage is the fact that it is only suitable for relatively simple and more familiar
problems in case of solutions being illogical and une$pected, it fails. From this, the following facts follow-
* players of positional type are relatively less sensitive to thin!ing time, but more sensitive to the difficulty of positions
it is contrary with tactical players- they are less sensitive to the difficulty of positions, but more sensitive tothin!ing time
ence the fact that the si;e of the slope of the relationship between average error and a factor of difficulty depends on
player type. Tactical players have it smaller, positional ones bigger. 1uch a phenomenon may give us a simple methodto find out which players have bigger relative importance of calculations and which ones intuitionB!nowledge in their
move/finding processes.
The graph on the left shows the absolute average slope of all entities covered in this study. "ut, as it can be seen on thegraphs 6 and +, the si;e of slopes is dependent on the average error. Therefore, it is more preferable to determine how
Graph 2: deflation in the chessmetrics rating system
-800
-00
-600
-500
-400
-300
-200
-100
0
100
f(x) = 3,8x - 530,16
!" = 0,92
$ess%etri's rating *eflation
&ear
g a #
73 points per decade
-
8/18/2019 Quality of Play in Chess and Methods for Measuring
19/24
much the actual slope deviates from the e$pected slope. The formulas for calculating the e$pected slope are given here-
complexity relative slope=a(solute slope
4.6*∗avg error 4.@7
and
difference relative slope=a(solute slope
4.8∗ln (avg error )+4.+@
The smaller the digit, the bigger is the importance of calculations. The graph on the right reveals that all chess enginesare at the top half, according to e$pectation. %apablanca and Dramni! are situated at the bottom half, confirming the
common belief that those players were primarily intuitive players. >erhaps surprisingly, Dasparov and Darpov standclose to each other and 'as!er so far down.
The changes of the average relative slopes across time and both rating systems are presented below.
Graph 26: players sorted (y a(solute slopes
.lit 200
1920s
2300
2600
1900
1860s
2400
2500
1910s
2000
2100
2200
aser
200
1960s1890s
180s
ra%ni
i'ro-ax
190s
ra#i* 200
1930s
1900s
as#arov
1990s
1940s
2000s
1980s
ar#ov
$a#a.lan'a
1950s
1880s
ilou
$raft&
ax%an
is'er
$arlsen
7iar's
-0,1 0 0,1 0,2 0,3 0,4 0,5 0,6 0,
0,45
0,42
0,35
0,34
0,34
0,31
0,31
0,30
0,29
0,29
0,28
0,28
0,28
0,2
0,260,26
0,25
0,25
0,23
0,22
0,22
0,20
0,19
0,19
0,19
0,18
0,16
0,15
0,14
0,09
0,0
0,06
0,06
0,05
0,02
0,01
0,01
-0,01
a.solute slo#es
'o%#lexit& *ifferen'e average
Graph 27: players sorted (y relative slopes
ra%ni
2600
1920s
1860s
200
2400
2300
aser
2200
1940s
.lit 200
180s
$a#a.lan'a
2500
1910s
1960s1900
as#arov
ar#ov
i'ro-ax
1890s
190s
1990s
1930s
2000s
ra#i* 200
$raft&
2100
1900s
ilou
1980s
2000
1880s
1950s
ax%an
is'er
$arlsen
7iar's
-1,00 -0,50 0,00 0,50 1,00 1,50 2,00 2,50 3,00
2,02
1,82
1,6
1,50
1,46
1,43
1,31
1,30
1,25
1,23
1,22
1,22
1,20
1,20
1,19
1,101,04
1,02
1,00
0,9
0,93
0,84
0,84
0,83
0,
0,
0,5
0,3
0,69
0,6
0,64
0,63
0,52
0,4
0,44
0,10
0,04
-0,21relative slo#es
'o%#lexit& *ifferen'e average
Graph 2#: relative slope across time periods
1860s 180s 1880s 1890s 1900s 1910s 1920s 1930s 1940s 1950s 1960s 190s 1980s 1990s 2000s
0
0,5
1
1,5
2
2,5
relative slo#es vs *e'a*es
average ogarit%i' (average) 'o%#lexit&
ogarit%i' ('o%#lexit&) *ifferen'e ogarit%i' (*ifferen'e)
*e'a*es
r e l a t i v e s l o # e
-
8/18/2019 Quality of Play in Chess and Methods for Measuring
20/24
It appears that the average relative slope slowly decreases with time the rate of decrease is even bigger in the %%&'rating system. It e$hibits a completely different behaviour in the FI rating system where stronger players have bigger slopes. In other words, today#s chess players have become more calculative that they were in the past. In a sense, it is
logical after 9y 1ystem by ?im;owitsch there have been no significant brea!through in chess middlegame theory.
-
8/18/2019 Quality of Play in Chess and Methods for Measuring
21/24
>ositions have become somewhat easier compared to earlier times. specially eye/catching is the low point between
*@4/*@84. &egarding FI rating, it loo!s li!e stronger players have a tendency to ma!e positions more complicated.
"etter chess engines, on the other hand, end up playing in relatively easier positions.
8.5 Influence of errors
And as a final part, here is data on the influences of errors of various magnitude. The influence of errors is calculated by
multiplying the freuency by its magnitude. "y comparing the resulting number with those of other errors, it can be
ascertained which magnitudes of errors are the biggest source of inaccurate play. 0n the graphs below each bluedatapoint mar!s the product of the magnitude of an error and freuency. The red line shows the moving average of @
datapoints. Npper graph is based on data used in this study (*+++ positions, average error 4.*64), lower graph
represents data ta!en from an earlier study5 (*8 *+8 positions, average error 4.*7).
5 http-BBwww.chessanalysis.eeBsummary854.pdf
Graph 33: complexity across $/!. rating
1900 2000 2100 2200 2300 2400 2500 2600 200
0,4
0,44
0,48
0,52
0,56
0,6
'o%#lexit& v s ;E
;E 2008
' o % # l e x i t &
Graph 34: difference across $/!. rating
1900 2000 2100 2200 2300 2400 2500 2600 200
0,2
0,24
0,28
0,32
0,36
0,4
*ifferen'e vs ;E
;E 2008
* i f f e r e n ' e
Graph 3: complexity across CC+- rating
1800 2000 2200 2400 2600 2800 3000
0,4
0,44
0,48
0,52
0,56
0,6
'o%#lexit& vs $$!
$$! 2014
' o % # l e x i t &
Graph 36: difference across CC+- rating
1800 2000 2200 2400 2600 2800 3000
0,160,10,180,19
0,20,210,220,230,240,250,26
*ifferen'e vs $$!
$$! 2014
* i f f e r e n ' e
Graph 37: influence of errors 1
0
4
8
12
16
20
24
influen'e of errors 1 (avg error 0F160)
su% of errors
%oving average of 9
avg error
e r r o r s u %
-
8/18/2019 Quality of Play in Chess and Methods for Measuring
22/24
espite of the fact that the average error is significantly different in both dataset, side/by/side comparison reveals, as
shown by the red moving average line, biggest influence is roughly eual, both are around 4.4. ence the uestion-will the main source of inaccurate play also remain the same if we have a separate loo! at engines and humans with
roughy same accuracyE 0n the following graphs, the overall average error of all engine moves is 4.438. uman moveswere grouped in two, one based on players with average error higher than 4.*44 and those whose average error was
lower than 4.44. The average errors of all moves combined were 4.4+* and 4.87 respectively.
Graph 3#: influence of errors 2
0
2
4
6
8
10
12
1416
18
20
22
24
26
influen'e of errors 2 (avg error 0F132)
su% of errors
%oving average of 9
avg error
e r r o r s u %
Graph 3': /nfluence of engine errors
0
0,5
1
1,5
2
2,5
3
3,5
4
4,5
influen'e of engine errors (avg error 0F084)
su% of errors
%oving average of 9
avg error
e r r o r s u %
Graph 4%: /nfluence of human errors 1
0
0,5
1
1,5
2
2,5
3
3,5
4
4,5
influen'e of u%an errors 1 (avg error 0F01)
su%
%oving average of 9
avg error
e r r o r s u %
-
8/18/2019 Quality of Play in Chess and Methods for Measuring
23/24
The results might be described as remar!able. All three graphs show that, irrespective of the accuracy of the moves, theerror influence pea! is persistently situated around 4.4 mar!. True, in the case of engines, there is a small oddity, the
pea! seems to have a little cavity centered around 4.4, with two ridges surrounding it. It is presently un!nown whatmay be the cause for that. These graphs also e$plain why it is not recommendable to use threshold/based analysis, at
least not with the threshold values of 4.*4 and above. At first glance, small to medium/si;ed errors may seeminsignificant, but what they lac! in gravity, they ma!e up for being more numerous.
5. %onclusion and future perspectives
In this study the author, using &yb!a 7, tried to measure the objective strength of play and to determine its relationship
between wither type of rating systems. "esides that the aim was to record the change of the strength of play throughtime. In could be compared to athletics world record progression or world leading mar! tables, that provide a good
overview of development in athletics. Nnli!e many sports today, chess has been played for centuries, and the level of play has been since long ago been very high. As it became clear earlier, already by the beginning of the previous
century, top players were on a par with today#s wea!er :9/s. In the light of this info, Gohn ?unn#s speculation that
ugo 1Schting at Darlsbad tournament in *@** was merely a *44/rated player, should be regarded as a seriousunderestimation. According to chessmetrics rating after the tournament, he was only 57 points short of 'as!er, which
would rate the latter ca 54 points lower than can be seen on the graph *+. ue to the fact that so far there has been noreliable method for measuring the uality of play, such a phenomenon as overhyping of players of the past has gained
ground. 1urprisingly many people seriously thin! that former great figures were at least as talented as today#s top
players, and that they played as well or even better. >sychologically completely understandable, practicallyunnecessary we all tend to see the past more beautiful than it actually was. >layers of the past are being overrated also
because out of all their games, there is a tendency to selectively highlight better specimens, whereas in the case of contemporary players, various sites providing live engine/assisted analysis display their average level. And since the
population of the world and the number of chess players bac! then were smaller, the same thing can be said about talent pool. It is more probable to find more naturally talented players in larger pool.
%omparison between %%&' and FI rating systems gave a surprising conclusion. "efore that, the author held an
opinion that both systems had an analogous relationship between strength and accuracy. It comes out that therelationships are of opposite nature. The accuracy of humans decreases logarithmically with strength of play, with
gradually diminishing rate, but the accuracy of play of engines, on the other hand, decreases at e$ponential rate. Thefact that engines from the bottom part of the rating list are wea! has been noted already long ago. 0ne conclusion that
can be made is that it is virtually impossible to reach negative ratings in engine rating systems, whereas it is very easy inthe FI rating system. The wea! point in the conclusion is that, because of the lac! of proper methods, it was not
possible to rec!on with the impact of the anti/computer strategy on results. In the future it would be necessary to devisemethods to describe and research it more closely and how it depends on the strength of engines and depth of search.
>roblematic is the relative instability of human play, which is clearly illustrated on the graph 4. It however ma!es
coclusions somewhat untrustable. Therefore increase in the number of analy;ed moves per player is recommended.
The most difficult part in such analysis wor!s is obviously practical play. There were no satisfactory outcomes
regarding that. It was found out that a phenomenon that could be called #objectivity/practicality bias# is still present inresults. Therefore players# results whose difficulty of positions was far from the average must be regarded with caution.
The more difficult positions, the bigger the probability that his result according to analysis tourns out to be underratedand, in case positions far below average, the results will be generally overrated. >reviously we saw that practical play is
Graph 41: influence of human errors 2
0
1
2
3
4
5
6
8
9
10
influen'e of u%an errors 2 (avg error 0F243)
su%
%oving average of 9
avg error
e r r o r s u %
-
8/18/2019 Quality of Play in Chess and Methods for Measuring
24/24
based on differentials in the three categories- difficulty of positions, type of positions and thin!ing time. "ut there is
also another interesting, at least a theoretical way. It is based on the fact that moves can be characteri;ed not only byuality, but also by how easilyBhard they can be noticed. 1ome moves are fairly obvious, others can seem uite illogical
and at first glance wrong. The fact that there are a lot of positions where the best / often only / move is e$tremely hardto see, is one of chief reasons why chess is such a difficult and fascinating game. ere we have arrived at the problem
of measuring the obviousness of moves. 0n what basis and how to measure itE ere; in
Thessaloni!is 4*7 and Dramni! in 'ondon %lassic 4** were also remar!able performances. There e$ists a small possibility that their actual uality of play surpasses that of %arlsen. These three performances definitely deserve further
scrutiny. The closer a score is to *44= and the fewer games played, the more unreliable T>& value will be. Thus itwould be interesting to loo! into discrepancies between T>& and uality of play as a function of score and the number
of games. It is worth to pay attention to tournaments of 'as!er and %apablanca in ?ew or! in *@8 and *@+. espitethe fact that they both had almost eual chessmetrics T>&s, as shown on the graph *@, that the difference in the uality
of play, caused by the objectivity/practicality bias is 45 points. Ta!ing into account the large amount of games and therelatively small timespan between them, it is uite probable that the uality of play of both players closely corresponds
to the T>&s. For this reason comparing games of 'as!er and %apablanca from the ?ew or! tournaments would presumably be a good indicator of the trustability of methods used for analysis.
&esults also showed that FI rating since *@+4 has been inflating with respect to absolute strength, with the average
rate about 5 points per decade. It is still relatively modest which may e$plain why D.