inconsistencies of recently proposed citation impact indicators and how to avoid them

12
Inconsistencies of Recently Proposed Citation Impact Indicators and How to Avoid Them Michael Schreiber Institute of Physics, Chemnitz University of Technology, 09107 Chemnitz, Germany. E-mail: [email protected] It is shown that under certain circumstances in particu- lar for small data sets, the recently proposed citation impact indicators I3(6PR) and R(6,k) behave inconsis- tently when additional papers or citations are taken into consideration. Three simple examples are presented, in which the indicators fluctuate strongly and the ranking of scientists in the evaluated group is sometimes com- pletely mixed up by minor changes in the database. The erratic behavior is traced to the specific way in which weights are attributed to the six percentile rank classes, specifically for the tied papers. For 100 percentile rank classes, the effects will be less serious. For the six classes, it is demonstrated that a different way of assigning weights avoids these problems, although the nonlinearity of the weights for the different percentile rank classes can still lead to (much less frequent) changes in the ranking. This behavior is not undesired because it can be used to correct for differences in citation behavior in different fields. Remaining devia- tions from the theoretical value R(6,k) = 1.91 can be avoided by a new scoring rule: the fractional scoring. Previously proposed consistency criteria are amended by another property of strict independence at which a performance indicator should aim. Introduction Recently, there has been a controversy about the best way to normalize citation impact indicators. The intensive dis- cussion in the literature in the last two years will not be repeated here; Leydesdorff, Bornmann, Mutz, and Opthof (2011) provide a good overview. The core issue of that debate was about replacing the rate of averages with the average of rates. Finally, two of the main contributors joined forces and proposed the use of alternative indicators based on nonparametric scores. They applied it to a set of seven individuals (Leydesdorff et al., 2011) as well as to two larger sets of journals (Leydesdorff & Bornmann, 2011). However, the latter version is meant to enable one to compare citation impact not only among journals but also among institutions or individuals. Rousseau (2012) noted that the I3 is not a consistent . . . indicator. This is because adding a new article changes the reference set, and hence, each element’s percentile and all class borders. It is then always possible—by exploiting these small changes—to prove inconsistency. (p. 419) When I tried to apply the indicators to a rather small number of scientists with not-so-large citation records, specifically the 26 physicists whose citation records I previously studied in detail (Schreiber, 2008, 2009), I observed such inconsis- tencies: Small changes in the database led to unexpected behavior of the indicators. To illustrate these problems, I constructed a few simple examples (presented later). Thus, I show that additional papers and/or citations can lead to strongly fluctuating values of the indicators and can upset the ranking of scientists in the evaluated group, even if the citation records of these scientists remain unchanged. In several instances, these changes violate a consistency crite- rion recently proposed by Waltman and van Eck (2012), but in a variety of instances from my examples, the counterin- tuitive behavior that I call inconsistent is not covered by the criteria in that article. Rather, a new criterion is needed, as proposed in the final section of the present article. Calculation of the Citation Impact Indicators I3(6PR) and R(6,k) The new indicators are based on the determination of percentile ranks which are identified for each publication. Specifically, Leydesdorff and Bornmann (2011) proposed the “counting rule that the number of items with lower citation rates than the item under study determines the percentile” (p. 2137). In Table 1, this is applied to my first example, Data set A, which comprises 40 publica- tions. For the present purpose, it is not yet necessary to Received February 7, 2012; revised March 16, 2012; accepted March 19, 2012 © 2012 ASIS&T Published online in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/asi.22703 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, ••(••):••–••, 2012

Upload: michael-schreiber

Post on 13-Oct-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Inconsistencies of Recently Proposed Citation ImpactIndicators and How to Avoid Them

Michael SchreiberInstitute of Physics, Chemnitz University of Technology, 09107 Chemnitz, Germany. E-mail:[email protected]

It is shown that under certain circumstances in particu-lar for small data sets, the recently proposed citationimpact indicators I3(6PR) and R(6,k) behave inconsis-tently when additional papers or citations are taken intoconsideration. Three simple examples are presented, inwhich the indicators fluctuate strongly and the rankingof scientists in the evaluated group is sometimes com-pletely mixed up by minor changes in the database. Theerratic behavior is traced to the specific way in whichweights are attributed to the six percentile rank classes,specifically for the tied papers. For 100 percentile rankclasses, the effects will be less serious. For the sixclasses, it is demonstrated that a different way ofassigning weights avoids these problems, although thenonlinearity of the weights for the different percentilerank classes can still lead to (much less frequent)changes in the ranking. This behavior is not undesiredbecause it can be used to correct for differences incitation behavior in different fields. Remaining devia-tions from the theoretical value R(6,k) = 1.91 can beavoided by a new scoring rule: the fractional scoring.Previously proposed consistency criteria are amendedby another property of strict independence at which aperformance indicator should aim.

Introduction

Recently, there has been a controversy about the best wayto normalize citation impact indicators. The intensive dis-cussion in the literature in the last two years will not berepeated here; Leydesdorff, Bornmann, Mutz, and Opthof(2011) provide a good overview. The core issue of thatdebate was about replacing the rate of averages with theaverage of rates. Finally, two of the main contributors joinedforces and proposed the use of alternative indicators basedon nonparametric scores. They applied it to a set of sevenindividuals (Leydesdorff et al., 2011) as well as to two largersets of journals (Leydesdorff & Bornmann, 2011). However,

the latter version is meant to enable one to compare citationimpact not only among journals but also among institutionsor individuals.

Rousseau (2012) noted

that the I3 is not a consistent . . . indicator. This is becauseadding a new article changes the reference set, and hence,each element’s percentile and all class borders. It is thenalways possible—by exploiting these small changes—toprove inconsistency. (p. 419)

When I tried to apply the indicators to a rather small numberof scientists with not-so-large citation records, specificallythe 26 physicists whose citation records I previously studiedin detail (Schreiber, 2008, 2009), I observed such inconsis-tencies: Small changes in the database led to unexpectedbehavior of the indicators. To illustrate these problems, Iconstructed a few simple examples (presented later). Thus, Ishow that additional papers and/or citations can lead tostrongly fluctuating values of the indicators and can upsetthe ranking of scientists in the evaluated group, even if thecitation records of these scientists remain unchanged. Inseveral instances, these changes violate a consistency crite-rion recently proposed by Waltman and van Eck (2012), butin a variety of instances from my examples, the counterin-tuitive behavior that I call inconsistent is not covered by thecriteria in that article. Rather, a new criterion is needed, asproposed in the final section of the present article.

Calculation of the Citation Impact IndicatorsI3(6PR) and R(6,k)

The new indicators are based on the determination ofpercentile ranks which are identified for each publication.Specifically, Leydesdorff and Bornmann (2011) proposedthe “counting rule that the number of items with lowercitation rates than the item under study determines thepercentile” (p. 2137). In Table 1, this is applied to myfirst example, Data set A, which comprises 40 publica-tions. For the present purpose, it is not yet necessary to

Received February 7, 2012; revised March 16, 2012; accepted March 19,

2012

© 2012 ASIS&T • Published online in Wiley Online Library(wileyonlinelibrary.com). DOI: 10.1002/asi.22703

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, ••(••):••–••, 2012

distribute these publications among several individuals;this can be done later. At present, the table remains easierto survey if only the total data set is described and if allpapers with the same number of citations are collectedin one group. Therefore, in Table 1, the number n(c) ofpublications with a certain citation frequency c is given.

Altogether, there are n n cc

tot = ( ) ==

∑0

40 papers with

n c n cc

tot *= ( ) ==

∑0

52 citations. For every group, the

number n< of papers with less citations than each paper inthis group received is determined. This is then expressed asa percentage. In accordance with Leydesdorff et al. (2011),“tied, that is, precisely equal, numbers of citations thus arenot counted as fewer than” (p. 1372). The rounded integernumbers of the percentiles thus apply to all papers in therespective group because they have exactly equal citationfrequencies.

Following the categorization used by the NationalScience Board (2010) for Science and Engineering Indica-tors (their Appendix Table 5–43), Bornmann and Mutz(2011) suggested categorizing the papers into six percentilerank (PR) classes; namely, the best 1, 5, 10, 25, 50, and theremaining 50%.1 Specifically, papers with a determined per-centile less than the 50th (or equivalently, a percentage<50.0) are included in Class 1, and papers with a percentileless than the 75, 90, 95, and 99th percentiles are aggregatedin Classes 2, 3, 4, and 5, respectively, if they do not belongto a lower rank class. The top Class 6 comprises the 99 and100th percentiles. These label numbers w of the six PRclasses are then employed to weight each publication, as

shown in Table 1. Using this scoring rule, the sum over allpublications was taken by Leydesdorff and Bornmann(2011) to define the integrated impact indicator I3(6PR).These authors also used the percentiles directly as weightsbetween 1 and 100 to define I3(100PR); this will not be donein the present investigation. For the example Case A1 in

Table 1, one obtains I n w n cc

3 6 760

PR *tot( ) = = ( ) ==

∑ by

adding the values in the last line in the table. Normalizingthe result by the total number of papers yields the citationimpact indicator R(6,k) when the same six PR classes areused (Leydesdorff et al., 2011). Thus, R(6,k) = 76/40 = 1.9for the example in Table 1.

Inconsistencies of the Indicators for Example A

Due to the relatively small number of source items in theexample and the relatively small number of citations, a largenumber of ties occurs. It is shown later that the specific wayof treating the tied papers can lead to counterintuitive andinconsistent behavior of I3(6PR) [and thus also of R(6,k)],which is aggravated by the nonlinear behavior due to the useof six PR classes. If 100 PR classes are used, one can inprinciple construct similar examples, but the occurrence ofinconsistencies is much more unlikely, and the effects aresmaller.

The inconsistencies already occur for the presentExample A when one assumes that one of the uncited papersin the example receives more and more citations while thecitation rates of the other papers remain unchanged.

If one of the uncited papers receives its first citation, thenthe resulting indicator I3(6PR) significantly drops; namely,to the value 66. The derivation of this result is shown inTable 2 in analogy to Table 1.2 Assuming that this publica-tion receives additional citations, the I3(6PR) indicatorshows a fluctuating behavior, as demonstrated in Table 3. Asthe number of papers remains constant, the normalized

1In this case, it appears to be more appropriate not to round the per-centages to integer percentiles because such a rounding would, for example,put the 49.5% into the 50th percentile and thus into the second-lowest PRclass instead of the lowest PR class. Therefore, one should rather take therational number of the percentage or “round” noninteger values to thenext-lower integer number (i.e., use the floor function) to define the per-centile. The discussed rounding procedure does not influence the values inTable 1.

2Here, the usual rounding of the percentage 47.5 would put the singlycited papers into the 48th percentile while cutting by the floor functionwould put them into the 47th percentile, without any change in the rankclass and thus without any change for I3(6PR).

TABLE 1. Determination of the indicator I3(6PR) for Case A1 ofExample A comprising n(c) papers with c citations each. There are nopapers with c = 2, 4, 6, 8, or more citations, so that respective columns areexcluded.

Example ACitation frequency c

TotalCase A1 0 1 3 5 7

No. n(c) of papers 20 10 6 2 2 40Citations c*n(c) 0 10 18 10 14 52No. n< of papers with

less citations0 20 30 36 38

Percentage n<*100/ntot 0 50 75 90 95Rank class = weight w 1 2 3 4 5Weighted no. of papers

w*n(c)20 20 18 8 10 76

TABLE 2. Same as Table 1, but for Case A2 in which one uncited paperfrom A1 received its first citation.

Example ACitation frequency c

TotalCase A2 0 1 3 5 7

No. n(c) of papers 19 11 6 2 2 40Citations c*n(c) 0 11 18 10 14 53No. n< of papers with

less citations0 19 30 36 38

Percentage n<*100/ntot 0 47.5 75 90 95Rank class = weight w 1 1 3 4 5Weighted no. of papers

w*n(c)19 11 18 8 10 66

2 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2012DOI: 10.1002/asi

indicator R(6,k) behaves in the same way. Overall, a strongdecrease in these indicators is achieved, although the impactof the data set as measured by the number of citations isincreasing. I consider this behavior counterintuitive; in myview, therefore, the indicator I3(6PR) as well as R(6,k) areinconsistent.

A closer inspection of Tables 1 and 2 reveals the reasonfor the strong drop of the indicators: For a large number ofpapers (viz., those with one citation each), the percentagen< changes from 50.0 to 47.5, and thus their PR classchanges so that their weight decreases from 2 to 1. This isdue to the specific treatment of tied papers, which are allgrouped into the lower PR class because there are notenough papers with fewer citations. This can be seen inTable 2 where the 11 papers with one citation each are allgrouped into the 47th percentile and thus into the bottom-50% class, although together with the 19 uncited papers theyamount to 75% of the entire sample.

Leydesdorff and Bornmann (2011) stated that they “wishto give the papers the benefit of the doubt” (p. 2137), byproviding tied papers with the highest possible ranks. Notethat this is in contradiction to their stated strict counting rule.Overruling the counting rule and giving these 11 papers thehighest possible rank by attributing them to the 72nd per-centile would give each of them a weight of 2 and result ina value of 77 for the indicator I3(6PR). Thus, this wouldmake the behavior of I3(6PR) look more consistent compar-ing Tables 1 and 2 because an additional citation would leadto only a small increase of I3(6PR). [Ideally, I3(6PR) shouldnot change at all for a different number of citations.]However, if one changes the citation numbers in Table 1 inthe opposite way, namely to 21 uncited papers and 9 paperswith one citation each, then all the 21 papers would end upin the 50th percentile due to “the benefit of the doubt” andthus would all get a weight of 2, resulting in a totalI3(6PR) = 96. So the inconsistency would not only remainbut even be enhanced.

Pudovkin and Garfield (2009) used a different scheme totreat tied papers, assigning the average of the ranks to everypaper in a tied set. Aside from rounding effects, this isequivalent to averaging the percentile values and also does

not solve the underlying problem because, again, the assign-ment of a large number of tied papers to one or the other PRclass can be determined by a single (additional) citation.Changing the present example by starting with 12 uncitedpapers and 18 papers with one citation each, the respectivepercentile classes for the second group would range from 30to 72 and give an average value of 51. This would put thesepapers into PR class 2 and attribute a weight of 2 to each ofthem, yielding a contribution of 36 to I3(6PR) in addition to12 points from the 12 uncited papers. A single additionalcitation to a previously uncited paper would put the now 19papers with one citation each into the percentile classes from27 to 72, with an average of 49.7, and therefore give each ofthese a weight of 1, resulting in a contribution of only 19 toI3(6PR) in addition to 11 points from the 11 remaininguncited papers. Again, an additional citation has led to astrong reduction of I3(6PR).

How to Avoid the Inconsistencies by AttributingAveraged Weights

A solution for the present problem is not to take thelowest possible rank as in Tables 1 and 2 or the highestpossible rank as suggested by Leydesdorff and Bornmann(2011), or to average the rank as proposed by Pudovkin andGarfield (2009), but rather to average the weights of the tiedpapers. This means to initially use different ranks for thepapers (without caring about the sequence of tied papers)and to attribute respective weights according to the utilizedscoring rule. Then, the arithmetic average is taken over theassigned weights for the tied papers, and this average weightis now given to all the tied papers, which will thus get thesame value so that their sequence does not matter. Thismeans that noninteger weights are utilized; thus, the dem-onstrated strong jumps in the contribution of tied papers tothe overall value of the indicator, which I now labelI3av(6PR) [and therefore also to the normalized Rav(6,k)], areavoided. In the present example, we obtain I3av(6PR) = 76and Rav(6,k) = 1.9 for all cases in Table 3. It may be consid-ered a disadvantage that the indicators do not change,although the number of citations changes. But this is not

TABLE 3. Development of the indicators I3(6PR) and R(6,k), if one uncited paper from Case A1 receives more and more citations, as specified step-by-stepby the number of papers for the different citation frequencies.

Example ANo. of papers with citation frequency

Total no.of papers

Total no.of citations I3(6PR) R(6,k)Case 0 1 2 3 4 5 6 7 8

A1 20 10 0 6 0 2 0 2 0 40 52 76 1.90A2 19 11 0 6 0 2 0 2 0 40 53 66 1.65A3 19 10 1 6 0 2 0 2 0 40 54 67 1.68A4 19 10 0 7 0 2 0 2 0 40 55 61 1.53A5 19 10 0 6 1 2 0 2 0 40 56 62 1.55A6 19 10 0 6 0 3 0 2 0 40 57 60 1.50A7 19 10 0 6 0 2 1 2 0 40 58 61 1.53A8 19 10 0 6 0 2 0 3 0 40 59 59 1.48A9 19 10 0 6 0 2 0 2 1 40 60 60 1.50

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2012 3DOI: 10.1002/asi

surprising because the indicators are based on the ranking ofthe papers rather than on the actual number of citations.Therefore, beside artificial effects due to the specific way ofgrouping and weighting tied papers and beside roundingeffects due to small numbers in the finite data set, the indi-cators for the complete data set should not depend on thenumber of citations. Due to the normalization, Rav(6,k)should not even depend on the number of papers. Of course,for different subsets representing individual scientists, therewill be changes as demonstrated later.

Another Example with Increasing Numberof Publications

If one does not keep the total number of papers in thesample constant but allows additional uncited papers to enterthe investigation, the inconsistencies of the indicators can beeven stronger. This is visualized in Figure 1, where I presenta second Example B, starting with Case B1, which com-prises seven singly cited papers, four doubly cited papers,and four triply cited papers (i.e., 15 publications total, with27 citations altogether). The histogram in Figure 1 illustratesthe number of papers for each citation frequency.

Adding another paper without any citation makes theindicator I3(6PR) jump from 19 to 28, and, likewise, R(6,k)from 1.27 to 1.75. Adding further uncited papers leads to anexpected steady, but rather small, increase of I3(6PR). Thisis shown for Cases B2 to B15 in Figure 1. At the same time,R(6,k) decreases because of the normalization by means ofthe total number of papers.

The next additional uncited paper leads to a jump ofI3(6PR) from 85 to 98 because suddenly all singly citedpapers have reached the second PR class. This is Case B16in Figure 1. The jump can be compensated by attributing afirst citation to one of the uncited papers (see Case B17 inFigure 1).

In the following cases, I have added either a singleuncited paper or a single citation to the example. In the lattersituation, the top of the histogram in Figure 1 remains flat,but one (and only one) of the internal boundaries in thehistogram displays a step. The resulting behavior of I3(6PR)in Figure 1 appears rather erratic. Admittedly, I haveselected the specific increases in such a way as to maximizethe fluctuations that can be seen in Figure 1. Overall, theincrease of I3(6PR) remains intact, but there are strongspikes superimposed. The strongest peaks occur for CasesB63 and B70, where an additional uncited paper letsI3(6PR) jump from 75 to 98 and from 80 to 111, respec-tively. Two and three steps, respectively, with additionalcitations are needed to compensate. Figure 1 ends with CaseB73, comprising 29 uncited, 15 singly cited, 9 doubly cited,and 7 triply cited papers (i.e., 60 papers total, with 54 cita-tions altogether). Of course, this development from Case B1to Case B73 is an extreme example, but it shows how fre-quently unexpected inconsistencies can occur.

As mentioned earlier, these problems do not appear whenone averages over assigned weights for tied papers instead ofassigning one weight for the averaged rank, lowest rank, orhighest rank of the tied papers. This can be seen in Figure 1,where the indicator I3av(6PR) based on the averaged weightsalso is plotted. It increases predictably: Each additionalpaper leads to a small increase, but these steps unfortunatelyare not all identical due to the aforementioned finite-sizeeffects. If the number of papers does not change, then alsoI3av(6PR) also remains constant, as discussed earlier.

The respective behavior of the indicator R(6,k) ispresented in Figure 2a. Here, the previously mentionedjump of R(6,k) from Case B1 to Case B2, the subsequentdecrease for the cases up to Case B15, the following jumpfor Case B16, and the subsequent erratic behavior with largefluctuations, strongest for Cases B63 and B70, can be easilyidentified.

FIG. 1. Evolution of indicators I3(6PR) (thin black line with crosses) and I3av(6PR) (thick green line) with increasing number of papers and citations forExample B, where the number of papers with 0, 1, 2, and 3 citations is given by the histogram (from top to bottom). [Color figure can be viewed in the onlineissue, which is available at wileyonlinelibrary.com.]

4 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2012DOI: 10.1002/asi

FIG. 2. Evolution of the indicator R(6,k) (thin black line with crosses) for Example B and contributions of the four scientists H, L, M, and N as describedin the text (a). Corresponding changes in the ranking of the four scientists, with the average value given for tied ranks (b). The symbols below the x axisdenote the changes in the data set: A triangle indicates an additional paper for Scientist N, a plus, cross, or star means an additional citation to a previouslyuncited, singly cited, or doubly cited paper of Scientist N, respectively. By construction of the example, exactly one of these changes occurs betweensubsequent cases. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2012 5DOI: 10.1002/asi

The curve for the averaged indicator Rav(6,k) in Figure 3ais much smoother, although there are still some fluctuationsdue to the discretization for the small finite numbers, asdiscussed earlier. Accordingly, with an increasing number ofpapers, these fluctuations become smaller, and the indicatorconverges toward the theoretical value 1.91, which is givenby the proportions of the PR classes and their weights:50% ¥ 1 + 25% ¥ 2 + 15% ¥ 3 + 5% ¥ 4 + 4% ¥ 5 + 1% ¥6 = 1.91.

Assigning the Papers of Example B to FourIndividuals in a Simple Pattern

The demonstrated fluctuations of the indicators arestrange, but they may not be considered a serious problembecause one might be tempted to allow such fluctuationsfor the description of the complete database used as a ref-erence set. However, I will now demonstrate that thesefluctuations are strongly reflected in the behavior of theindicators for different subsets of the database. Thesesubsets can be interpreted as the citation records of indi-vidual researchers contributing to the complete sample,which might be interpreted as the accumulated data for theinstitution of these researchers being used as the referenceset for the evaluation. It will be shown that this partitioningcan lead not only to erratic behavior of the indicators forthe subsets but also to chaotic evolution of the rankingof the subsets (i.e., the individuals). It also will be shownthat the proposed averaging of the weights will smooth thefluctuations as expected. For this purpose, I return first toExample B and Figure 2.

Figure 2 shows what can happen if the papers are distrib-uted among four scientists. This example is based on theevolution of the paper and citation counts in Example B andFigure 1. I have apportioned the papers in Example B to fourscientists—H, L, M, N—in a very simple way, attributing thefour papers with three citations each to the highly citedindividual, H, the four papers with two citations each to themoderately cited scientist, M, and the seven papers with onlyone citation to the least-cited researcher, L. The papers andcitations added during the evolution of Example B from CaseB1 to Case B73 in the previous section are all ascribed to thenew colleague, N, who is a rising star; thus, it is no surprisethat the indicator R(6,k) in Figure 2a reflects this rise.However, again, unreasonably strong fluctuations can beseen. Even more strange is the behavior of the resultingvalues of the individual indicators R(6,k) for the three col-leagues. Although they do not publish anything and also donot receive any citations, the first paper of N already leads toa strong increase of the indicators for H and M. The followingsmooth decrease for H, M, and L and the respective increasefor N up to Case B15 are reasonable. But then the indicatorsfluctuate so strongly that in many cases the rank order of H,M, and L is mixed up, although they still do not publish andalso are not cited anymore.

In Figure 2b, the ranks are visualized, and the erraticbehavior becomes even clearer. These cases demonstrate

that unreasonable changes in the ranking of the scientistsoccur, which means that not only the total value of R(6,k)behaves in an inconsistent way but that this also leads toinconsistencies in the ranking of individual contributors.

Figure 3a shows that the previously proposed averagingof the weights leads to a much smoother behavior of theindicators, although some undesirable fluctuations remaindue to the discretization effects for the small finite numbersof papers. But as Figure 3b demonstrates, the ranking ofthe scientists is now free from unexpected fluctuations: Thenewcomer N rises from the lowest rank to the top. Thechange of ranks between M and L can be explained bylooking into the details of the calculation. In the beginningin Cases B2 and B3, the seven singly cited papers of L eachobtain a weight 1 while the four doubly cited papers of Mare weighted with a factor 2, leading to a slightly smallercontribution of seven to I3av(6PR) for L in contrast to eightfor M. At the end of the evolution of Example B, therespective weights are 2 and 3, so that the contribution of14 to I3av(6PR) for L is slightly larger than the contributionof 12 for M. This leads to an exchange of the ranksbetween M and L; compare Cases B3 and B8 in Figure 3b,with tied ranks in between.3 In principle, this behavior alsocould be called inconsistent because the ranking changesfor two scientists whose publication numbers and citationrecords do not change. The fundamental reason for thisbehavior is the nonlinearity of the indicator. With theincreasing number of uncited papers in the total data set,the papers of L and M move to higher PR classes; due tothe nonlinearity, the relative weights change from the ratio1:2 to 2:3. So this is a different problem from the previ-ously solved difficulty concerning the treatment of tiedranks. This may not even be a problem but rather a desiredproperty that enables a meaningful distinction between dif-ferent fields where it should be possible that different ref-erence sets lead to different relative weights (discussedlater).

Assigning the Papers to the Four Scientists in aMore Complicated Pattern

Finally, I present a further Example C, which also isbased on the evolution of the paper and citation counts inExample B. The purpose of Example C is to display aneven more chaotic behavior and to demonstrate that thisalso is appropriately smoothed by the proposed procedureof averaging the weights. Moreover, this example also willbe used in the next section to discuss the consistency ofR(6,k) when two scientists achieve the same performanceimprovement.

In Example C, I have interrupted the advance of the risingstar N after 10 publications and allowed the other scientiststo add some uncited publications to their publication record.

3There also is a tie between M and L in Case B1 because the papers ofM get a weight of 1.75 due to the averaging.

6 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2012DOI: 10.1002/asi

FIG. 3. Same as Figure 2, but for the indicator Rav(6,k) (i.e., for averaged weights). [Color figure can be viewed in the online issue, which is available atwileyonlinelibrary.com.]

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2012 7DOI: 10.1002/asi

However, further citations are still attributed only to N.The actual sequence of additional papers and citations isindicated below the horizontal axis in Figure 4. The totalnumber of papers and citations in each case does not changein comparison to the previous figures; only the attributionof the uncited papers changes. I have tried to select thesechanges in such a way that the erratic behavior of the indicesfor the different scientists is maximized, but with the aim ofending the evolution with the same number of papers foreveryone (i.e., 15 papers for each scientist).

The wild fluctuations of R(6,k) in Figure 4a show that Ihave been fairly successful. The respective rankings arevisualized in Figure 4b. For example, a single additionalpaper for N in Case C16 has moved Scientist L from the

lowest rank in C15 to the top in C16, but this windfall profitis immediately lost when that publication of N receives acitation (see Case C17). In Case C19, it is at least an addi-tional paper of L that has led to the same jump of L from lastto first rank; but again, an additional citation for N com-pletely counterbalances this. In Case C30, it is an additionalpublication of H that has led to the same rank jump for L,again counterbalanced by an additional citation of N inCase C31. Likewise, an additional publication of ScientistM in Case C49 has moved L from the last rank to firstplace (tied with N). Here, N is already rather advancedand maintains top rank. Nevertheless, an additional citationagain counterbalances the windfall profit of L. Similarchanges not always as dramatic as these can be found

(a)

(b)

FIG. 4. Same as Figure 2, but for Example C. The total number of papers and citations in each case is the same as in Example B, so that the summedindicator R(6,k) is not shown again. The symbols below the axis have the same meaning as in Figure 2, but in addition, a red diamond, a blue square, andan orange circle indicate an additional paper for Scientists H, M, and L, respectively. By construction of the example, exactly one of these changes occursbetween subsequent cases. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

8 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2012DOI: 10.1002/asi

frequently. Particularly odd is Case B41, where an additionalpaper of M does not change M’s third position but pushes Hfrom first to last place and L from last to second. In the nextstep, B42, a citation to a paper of N, leads to a drop of N andan advance of L (i.e., they exchange positions, and so do Hand M). A further citation for N, Case B43, does not changeanything for N but sends L down to last position and H up totop rank (i.e., restores the sequence of Case B40).

Overall, the fluctuations in the ranking for Example C areeven stronger than those in Example B. Again, this problemcan be solved by the previously proposed alternative calcu-lation of the indicators averaging over the assigned weightsfor the tied papers. The results are shown in Figure 5a. Thecurves are not as smooth as those in Figure 3a, but thiscannot really be expected because now the publicationrecord of H, M, and L is not assumed to remain constant.

The respective rankings in Figure 5b show the drasticimprovement in comparison to Figure 4b. The exchange ofthe second and third places in the ranking between M and Lat the beginning of the evolution, compare Cases C3 and C8,is the same as in Example B as discussed earlier in connec-tion with Figure 3b. But in contrast to Example B, now Mand L change ranks again between Cases C49 and C50, sothat in the end, M is ranked higher than L. The reason forthis different final outcome is due to the different number ofuncited papers that are attributed to M and L to achieve thefinal total numbers of 15 publications for everyone.

Introducing Fractional Scoring

Rousseau (2011, 2012) and Leydesdorff and Bornmann(2012) discussed different scoring rules and their application

FIG. 5. Same as Figure 4, but for the indicator Rav(6,k) (i.e., for averaged weights). [Color figure can be viewed in the online issue, which is available atwileyonlinelibrary.com.]

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2012 9DOI: 10.1002/asi

for the indicator I3(6PR) without coming to a final solution,but rather fearing that they “may have opened a box ofPandora allowing for generating a parameter space of otherpossibilities” (Leydesdorff & Bornmann, 2012). Attributingaveraged weights as proposed earlier appears to be a goodsolution. However, if one applies the previously used count-ing rule, one of the remaining problems is that the “best”paper never gets the highest score if the total number ofpapers is smaller than 100.

Note that the ad hoc suggestion by Leydesdorff andBornmann (2011) to change the rounding by adding 0.9 tothe paper count does not influence the calculation of theindicators in the present example. The authors had intro-duced this rounding to avoid undesirable effects for data setsthat are smaller than 100 papers. The demonstrated incon-sistency cannot be solved by this unusual rounding. More-over, this rounding also can lead to other undesirable effectssuch as already putting the 110th of 111 papers into thetop-1% class (assuming that 109 papers have fewer citations,yielding a count of 109.9, which is 99.01% of 111 papers),which thus would comprise two of 111 papers (i.e., nearly2%). So it has more than the claimed “marginal effectsfor numbers in the set larger than 100” (Leydesdorff &Bornmann, 2012).

A slightly different scoring rule in agreement with Rous-seau (2012) always attributes the highest score to the mostcited paper. For this purpose, the papers are sorted byincreasing number4 of citations and given the rank r withoutcaring about the sequence of tied papers as mentionedearlier. Then, the percentage r ¥ 100/ntot is calculated andcompared with the boundaries of the six PR classes. Spe-cifically, papers with a thus-determined percentage less thanor equal to 50.0 are included in Class 1, and so on. Note thatin contrast to the earlier utilized scoring rule, now the paperin question is included in the counting, and correspond-ingly, the percentage boundary also is included in the com-parison. (Therefore, the rule is now less or equal instead ofless as mentioned earlier.) With regard to Example A, thischange of the counting rule does not influence the determi-nation of I3av(6PR) except for the top-ranked paper, whichis now attributed a weight of 6 instead of 5. This leadsto an increase of I3av(6PR) to 77; correspondingly,Rav(6,k) = 1.925 for all cases in Table 3. For Examples Band C, more changes occur because often the papers closeto, but not exactly at, the borders between the PR classes aregiven a different weight.

However, a closer inspection reveals that this rule is stillnot perfect because the final result deviates from the theo-retical value 1.91. The deviation can be traced to the dis-cretization. In the present Example A with 40 papers, everypaper amounts to 1/40 = 2.5% of the total number. Accord-ingly, the top-ranked paper contributes the highest weight,w = 6, with a proportion of 2.5% instead of 1% in the theo-retical distribution. On the other hand, using the scoring rule

from Leydesdorff et al. (2011) as in Tables 1 to 3, the toptwo papers would contribute the weight w = 5, with a pro-portion of 2 ¥ 2.5% = 5% instead of 4% (plus 1% of w = 6)as in the theoretical distribution.

The solution of this problem is a new scoring rule that isindependent of the specific way in which papers at theboundaries of the 6 or 100 PR classes are treated. In thisnew scoring rule, only the 1% fraction of the top paper isgiven the weight w = 6, and the remaining fraction 1.5% isgiven the weight w = 5. This would amount to a contribu-tion5 of 1% ¥ 6 + 1.5% ¥ 5 = 0.135 to Rfr(6,k), where I haveintroduced the superscript “fr” to indicate the use of thefractional scoring rule. Concerning the calculation ofI3(6PR), this fractionalization of the 2.5% of the totalnumber into 1% of the top class and 1.5% of the second-highest class means that two fifths of the paper get theweight w = 6 and three fifths get the weight w = 5, so thatI3fr(6PR) = 0.4 ¥ 6 + 0.6 ¥ 5 = 5.4. This value should not berounded to 5 but taken as it is. For completeness, note thatin the case of tied papers as mentioned earlier, after assign-ing the fractional weights to the tied papers, one has toaverage the weights of the tied papers and reassign theaverage weight to these tied papers.

But this fractional scoring makes the determination of theweights rather complicated in the general case because for adifferent total number ntot, such a fractional attribution ofweights would have to be applied not only for the top-rankedpaper but at all or nearly all borders between the different PRclasses. In the present Example A, all borders except for the99% boundary coincide exactly with one of the r/ntot values.Usually this is not the case, and thus a fractional attributionof the weights always would be necessary.6 Although it ismore significant for data sets with a small number of papers,the scoring rule also should be applied for large data setsbecause, in the general case, it can make a difference for allmanuscripts closest to a border between different rankclasses. In conclusion, I recommend always applying frac-tional scoring because, together with the suggested averag-ing of the fractional weights for the tied papers, it solves alldiscussed problems and thus closes the previously men-tioned Pandora’s box.

4Note that percentiles are more often determined after ranking thepapers by decreasing number of citations.

5In contrast in Tables 1 to 3, the top paper got a weight of 5 andcontributed 5/40 = 0.125 to R(6,k) and Rav(6,k), except in Case A8 where itachieved only the weight 4 and thus a contribution of 0.1 to R(6,k); forRav(6,k), the top paper would get the weight 5 also in Case A8 (beforeaveraging over the tied papers). In all these cases, the results are smallerthan those for fractional scoring. On the other hand, the previously men-tioned scoring rule from Rousseau (2012) would always assign a weight of6 to the top paper and thus would always lead to a contribution of6/40 = 0.15, which is above the fractional scoring value.

6For very small numbers ntot < 20 as occurring in Examples B and C,the top-ranked paper would even be spread over three PR classes andwould have to be attributed three different weights fractionally from thetop three PR classes. For example, in the case ntot = 16, every paper amountsto 1/16 = 6.25%, and the top paper receives 1.25% of weight 4 fromthe fourth PR class, the full 4% of weight 5 from the fifth PR class, and thefull 1% of weight 6 from the top class—altogether a contribution of1.25% ¥ 4 + 4% ¥ 5 + 1% ¥ 6 = 0.31 to Rfr(6,k).

10 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2012DOI: 10.1002/asi

Concluding Remarks

Waltman and van Eck (2012) discussed the consistencyof performance indicators from a theoretical viewpoint.Formal aspects can be found in the literature cited byWaltman and van Eck (2012). In that article, the theoreticalargument is presented completely in intuitive terms, and inparticular, the authors demanded that “[i]f two scientistsachieve the same absolute performance improve-ment, then their ranking relative to each other shouldremain unchanged” (p. 409). Although I have not con-structed my Example C with regard to this definition ofconsistency, a closer inspection of Figure 4 shows severalcases in which this condition is violated. Due the specificevolution of the example cases, there are only 15 instancesin which two of the scientists achieve the same perfor-mance improvement; namely, the addition of an uncitedpaper. Specifically, comparing Cases C17 and C19, H andL both get one more uncited paper, but exchange their rela-tive ranking. The same happens for them between CasesC43 and C45. Similarly, between Cases C39 and C41, therelative ranking of M and L is changed, although both ofthem achieve the same performance improvement of oneadditional uncited paper. And although in Case C54 there isa tie between N and H on ranks 2 and 3, attributing onemore uncited paper to each of them leads to rank 1 for Nand rank 4 for H in Case C56. Thus, in four of 15 instances,the indicator behaves in an inconsistent way in Example C.7

An inspection of Figure 5b shows that the proposed aver-aging of the weights leads to consistent behavior: In allmentioned cases, the relative ranking remains unchanged.

But in view of the strong fluctuations of the indicatorsand the rankings in Figure 2, I propose a further property ofstrict independence,8 at which a performance indicatorshould aim: “The ranking of two scientists relative to eachother should remain unchanged if a third scientist achieves aperformance improvement.”

In Example B, in which the performance improvement issolely attributed to Scientist N, this condition is violatedfrequently as all the changes in the rankings among H, M,and L in Figure 2b visualize. Again, averaging the weights,the newly proposed indicator behaves in a much moreconsistent way, as demonstrated by the rankings inFigure 3b that do not fluctuate. Beside the reasonableadvance of N from the bottom to the top rank, there is onlyone change in the ranking, between M and L, which wasdiscussed earlier and explained by the nonlinearity of theindicator. Thus, with this exception, the alternative calcula-tion of the indicators based on averaging over the assigned

weights fulfills the newly proposed property of strictindependence.

Note that this property is trivially fulfilled by unnormal-ized indices such as the h-index, which are only a functionof the publications of an individual author and the citationsto these publications, not using any other information. Fornormalized indicators such as I3(6PR) and R(6,k), oneneeds a reference set, which in the present analysis hasbeen taken to be the accumulated citation distribution ofthe four scientists. Different reference sets allow compari-son between different fields, and it is therefore desirablethat completely different reference sets can lead to differ-ent rankings of groups of scientists with identical citationrecords, but in different fields. In the present examples, theadditional publications and citations change the referenceset, and thus the various cases could be interpreted asdescribing a variety of fields. Then, it might be expectedthat the proposed criterion of strict independence is notfulfilled. However, it is odd that a series of small changesviolates the property as strongly and as often as shown inFigures 2b and 4b. In my view, it is much more reasonablethat the ranking changes more smoothly as in Figures 3band 5b.

The fluctuations in the presented examples occur due totied papers. I have intentionally constructed examples witha relatively small number of papers and with rather smallcitation frequencies to enhance the inconsistencies. In amore realistic sample, there would be fewer ties, and thusthe jumps and drops of the indicators would happen lessfrequently. Moreover, the reference set would typically bemuch larger and at least comprise the citation records of allscientists of an institution (i.e., of much more than fourpeople). Then, small changes in the reference set would nothave such a big influence as in the presented examples.Analyzing and comparing complete institutions or journals,or even countries, the database would probably be so largethat such problems would not occur at all. And if one takesall publications in a given field as the reference set used forthe normalization and measures the citation records of indi-vidual scientists or institutes against this reference set, thenit is extremely unlikely that the small changes have anyinfluence on the ranking. Nevertheless, as a matter of prin-ciple, these indicators should be defined in a way to avoidinconsistencies. As shown earlier, by treating the tiedpapers in a different way, the erratic fluctuations can bedrastically reduced. Changes in the ranking cannot andshould not be completely avoided because the nonlinearityof the weights for the different PR classes may still lead toa different ranking, which reflects a difference in the ref-erence set and thus is desirable when comparing differentfields.

Acknowledgment

I am grateful to two reviewers for very useful sugges-tions. I especially thank L. Waltman for the suggestion aboutthe alternative assignment of weights in fractional parts.

7In the evolution of Example B, only the performance of Scientist N ischanged, so that there cannot be any instance in which two scientistsachieve the same performance improvement.

8Following Rousseau (2012), I use the term independence rather thanconsistency. In the present case, this terminology better reflects the situ-ation because it refers to the requirement that the ranking of two scientistsshould be independent from the performance improvement of a thirdscientist.

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2012 11DOI: 10.1002/asi

References

Bornmann, L., & Mutz, R. (2011). Further steps towards an ideal method ofmeasuring citation performance: The avoidance of citation (ratio) aver-ages in field-normalization. Journal of Informetrics, 5(1), 228–230.

Leydesdorff, L., & Bornmann, L. (2011). Integrated impact indicators com-pared with impact factors: An alternative research design with policyimplications. Journal of the American Society for Information Scienceand Technology, 62(11), 2133–2146.

Leydesdorff, L., & Bornmann, L. (2012). Percentile ranks and the inte-grated impact indicator (I3). Journal of the American Society for Infor-mation Science and Technology, 63(9), 1901–1902.

Leydesdorff, L., Bornmann, L., Mutz, R., & Opthof, T. (2011). Turning thetables on citation analysis one more time: Principles for comparing setsof documents. Journal of the American Society for Information Scienceand Technology, 62(7), 1370–1381.

National Science Board. (2010). Science and engineering indicators. Wash-ington, DC: National Science Foundation. Available at: http://www.nsf.gov/statistics/seind10/

Pudovkin, A.I., & Garfield, E. (2009). Percentile rank and author superior-ity indexes for evaluating individual journal articles and the author’soverall citation performance. CollNet Journal of Scientometrics andInformation Management, 3(2), 3–10.

Rousseau, R. (2011). Percentile rank scores are congruous indicators ofrelative performance or aren’t they? arxiv.org/pdf/1108.1860.

Rousseau, R. (2012). Basic properties of both percentile rank scores and theI3 indicator. Journal of the American Society for Information Scienceand Technology, 63(2), 416–420.

Schreiber, M. (2008). An empirical investigation of the g-index for 26physicists in comparison with the h-index, the A-index, and the R-index.Journal of the American Society for Information Science and Technol-ogy, 59(9), 1513–1522.

Schreiber, M. (2009). A case study of the modified Hirsch index hm

accounting for multiple coauthors. Journal of the American Society forInformation Science and Technology, 60(6), 1274–1282.

Waltman, L., & van Eck, N.J. (2012). The inconsistency of the h-index.Journal of the American Society for Information Science and Techno-logy, 63(2), 406–415.

12 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2012DOI: 10.1002/asi