improved author profiling through the use of citation classes

27
B T 1 , W G 1,2 , K D 1 1 Centre for R&D Monitoring and Dept. MSI, KU Leuven, Belgium 2 Dept. Science Policy & Scientometrics, LHAS, Budapest, Hungary

Upload: bartthijs

Post on 01-Jul-2015

46 views

Category:

Science


4 download

DESCRIPTION

Improved author profiling through the use of citation classes Bart Thijs, Koenraad Debackere, Wolfgang Glänzel Presented at the STI 2014 Conference at Leiden University, Leiden, The Netherlands

TRANSCRIPT

Page 1: Improved author profiling through the use of citation classes

B T1, W G1,2, K D1

1Centre for R&D Monitoring and Dept. MSI, KU Leuven, Belgium2Dept. Science Policy & Scientometrics, LHAS, Budapest, Hungary

Page 2: Improved author profiling through the use of citation classes

Structure of presentation

1.

2. CSSM

3. D

4. R

5. C

Improved author profiling through the use of citation classes, Leiden, 2014 2/18

Page 3: Improved author profiling through the use of citation classes

Introduction

At the level of research teams and individual scientists bibliometricstandard indicators are easily distorted by the citation impact of papersthat considerable differ from their expected value.

Earlier we proposed a parameter-free solution providing fourperformance classes to replace mean-value based indicators.☞ This approach, which is based on the method of Characteristic Scores and

Scales (CSS) that has originally been introduced by Glänzel & Schubert(1988) in the context of journal analysis, can be interpreted as a reduction ofthe original citation distribution to a distribution over a given number (four)of performance classes with self-adjusting thresholds without requiringarbitrarily pre-set values.

This has been found useful at the level of national and institutionalassessment of citation impact.

Improved author profiling through the use of citation classes, Leiden, 2014 3/18

Page 4: Improved author profiling through the use of citation classes

Introduction

At the level of research teams and individual scientists bibliometricstandard indicators are easily distorted by the citation impact of papersthat considerable differ from their expected value.

Earlier we proposed a parameter-free solution providing fourperformance classes to replace mean-value based indicators.☞ This approach, which is based on the method of Characteristic Scores and

Scales (CSS) that has originally been introduced by Glänzel & Schubert(1988) in the context of journal analysis, can be interpreted as a reduction ofthe original citation distribution to a distribution over a given number (four)of performance classes with self-adjusting thresholds without requiringarbitrarily pre-set values.

This has been found useful at the level of national and institutionalassessment of citation impact.

Improved author profiling through the use of citation classes, Leiden, 2014 3/18

Page 5: Improved author profiling through the use of citation classes

Introduction

The proposed method has several striking advantages over traditionalmethods.

• It completely eliminates the effect of the heavy tails of citationdistributions including their typical outlier-based biases at lowerlevels of aggregation.

• It does not remove these tails from the analysis as these valuesrepresent the high-end of research performance and deserve specialaention.

• It is not sensitive to ties which are present in percentile basedmethods

• It ensures seamless integration into the standard tools ofscientometric performance assessment.

• It proved to be insensitive to both subject-specific peculiarities andthe particular choice of publication years and citation windows.

Improved author profiling through the use of citation classes, Leiden, 2014 4/18

Page 6: Improved author profiling through the use of citation classes

Introduction

The application of this method at the level of individual scientists andresearch teams remains a challenge.

Career evolution of individual scientists and changing team compositionalong with the typically small paper sets underlying the citationdistribution at this level require extremely stable and robust solutions.

This study analyzes in how far the CSS-based method meets theserequirements. And we will present some examples for its application tothis level.

Improved author profiling through the use of citation classes, Leiden, 2014 5/18

Page 7: Improved author profiling through the use of citation classes

Introduction

The application of this method at the level of individual scientists andresearch teams remains a challenge.

Career evolution of individual scientists and changing team compositionalong with the typically small paper sets underlying the citationdistribution at this level require extremely stable and robust solutions.

This study analyzes in how far the CSS-based method meets theserequirements. And we will present some examples for its application tothis level.

Improved author profiling through the use of citation classes, Leiden, 2014 5/18

Page 8: Improved author profiling through the use of citation classes

Introduction

The application of this method at the level of individual scientists andresearch teams remains a challenge.

Career evolution of individual scientists and changing team compositionalong with the typically small paper sets underlying the citationdistribution at this level require extremely stable and robust solutions.

This study analyzes in how far the CSS-based method meets theserequirements. And we will present some examples for its application tothis level.

Improved author profiling through the use of citation classes, Leiden, 2014 5/18

Page 9: Improved author profiling through the use of citation classes

CSS-Method

Characteristic scores are obtained• by iteratively calculating the mean value of a citation distribution and

• subsequently truncating this distribution by removing all papers with lesscitations than the conditional mean.

This procedure is stopped aer three iterations.

As a result we obtain three scores bk, with k = (1, 2, 3).By adding b0 = 0 and b4 = ∞ the following four classes can be defined:

[b0, b1) is the class of ‘poorly cited’ papers,

[b1, b2) contains ‘fairly cited’ papers,

[b2, b3) contains ‘remarkably cited’ papers and

[b3, b4) is the class of ‘outstandingly cited’ papers.

☞ The values k = 2 and k = 3 can be used to identify highly citedpapers.

Improved author profiling through the use of citation classes, Leiden, 2014 6/18

Page 10: Improved author profiling through the use of citation classes

CSS-Method

Characteristic scores are obtained• by iteratively calculating the mean value of a citation distribution and

• subsequently truncating this distribution by removing all papers with lesscitations than the conditional mean.

This procedure is stopped aer three iterations.

As a result we obtain three scores bk, with k = (1, 2, 3).By adding b0 = 0 and b4 = ∞ the following four classes can be defined:

[b0, b1) is the class of ‘poorly cited’ papers,

[b1, b2) contains ‘fairly cited’ papers,

[b2, b3) contains ‘remarkably cited’ papers and

[b3, b4) is the class of ‘outstandingly cited’ papers.

☞ The values k = 2 and k = 3 can be used to identify highly citedpapers.

Improved author profiling through the use of citation classes, Leiden, 2014 6/18

Page 11: Improved author profiling through the use of citation classes

Data

The data was built on downloaded ResearcherID data of 4.271 registeredresearchers from eight selected countries. Only authors with at least 20publications are taken into considerations. This selection creates abuilt-in bias.

• Authors with an Researcher-ID are more productive than others (seeHeeffer et al, 2013)

• The applied threshold implies the exclusion of occasional authorswhose impact is rather limited.

☛ The goal of the present study is to investigate the applicability of theCSS-method at this level and not to define specific referencestandards.

Improved author profiling through the use of citation classes, Leiden, 2014 7/18

Page 12: Improved author profiling through the use of citation classes

Data

The data was built on downloaded ResearcherID data of 4.271 registeredresearchers from eight selected countries. Only authors with at least 20publications are taken into considerations. This selection creates abuilt-in bias.

• Authors with an Researcher-ID are more productive than others (seeHeeffer et al, 2013)

• The applied threshold implies the exclusion of occasional authorswhose impact is rather limited.

☛ The goal of the present study is to investigate the applicability of theCSS-method at this level and not to define specific referencestandards.

Improved author profiling through the use of citation classes, Leiden, 2014 7/18

Page 13: Improved author profiling through the use of citation classes

Data

Moreover, in a real-world evaluative exercise or selection procedure, thestudied authors are indeed the more active ones.

• Publication data is retrieved from Web of Science.

• Journal publications of type @, L, N and R indexed between 1991and 2010 are selected.

• Subject classification is based on journal assignment according tothe Leuven-Budapest scheme.

Improved author profiling through the use of citation classes, Leiden, 2014 8/18

Page 14: Improved author profiling through the use of citation classes

Data

Moreover, in a real-world evaluative exercise or selection procedure, thestudied authors are indeed the more active ones.

• Publication data is retrieved from Web of Science.

• Journal publications of type @, L, N and R indexed between 1991and 2010 are selected.

• Subject classification is based on journal assignment according tothe Leuven-Budapest scheme.

Improved author profiling through the use of citation classes, Leiden, 2014 8/18

Page 15: Improved author profiling through the use of citation classes

Results: Performance classes

First, the distribution of papers across the classes was calculated.1. One scenario calculates the average distribution across all selected authors.

2. The second scenario calculates the distribution of all unique papers.

The distribution of 2010 papers is include for comparison.

Distribution of papers over performance classes

Data Set Class 1 Class 2 Class 3 Class 4

Average over all authors with R-ID 42.8 33.1 15.0 9.1All papers of authors with R-ID 44.3 32.9 14.5 8.4All papers indexed in 2010 69.8 21.5 6.2 2.5

Improved author profiling through the use of citation classes, Leiden, 2014 9/18

Page 16: Improved author profiling through the use of citation classes

Results: Performance classes

First, the distribution of papers across the classes was calculated.1. One scenario calculates the average distribution across all selected authors.

2. The second scenario calculates the distribution of all unique papers.

The distribution of 2010 papers is include for comparison.

Distribution of papers over performance classes

Data Set Class 1 Class 2 Class 3 Class 4

Average over all authors with R-ID 42.8 33.1 15.0 9.1All papers of authors with R-ID 44.3 32.9 14.5 8.4All papers indexed in 2010 69.8 21.5 6.2 2.5

Improved author profiling through the use of citation classes, Leiden, 2014 9/18

Page 17: Improved author profiling through the use of citation classes

Distribution of Performance Classes

In the first scenario, we can investigate the distribution of shares in each class.

☛ The strong deviation from normality in Class 4 reflects once again theproblematic behaviour of extreme values and their particular distribution.

Class 1 Class 4

0%

1%

2%

3%

4%

5%

6%

7%

8%

9%

10%

0% 8% 16% 24% 32% 40% 48% 56% 64% 72% 80% 88% 96%

0%

5%

10%

15%

20%

25%

30%

35%

0% 8% 16% 24% 32% 40% 48% 56% 64% 72% 80% 92% 100%

The distribution of shares in the two first classes is approximately normal. Thethird class deviates slightly, the last one strongly from a normal distribution.

Improved author profiling through the use of citation classes, Leiden, 2014 10/18

Page 18: Improved author profiling through the use of citation classes

Distribution of Performance Classes

In the first scenario, we can investigate the distribution of shares in each class.

☛ The strong deviation from normality in Class 4 reflects once again theproblematic behaviour of extreme values and their particular distribution.

Class 1 Class 4

0%

1%

2%

3%

4%

5%

6%

7%

8%

9%

10%

0% 8% 16% 24% 32% 40% 48% 56% 64% 72% 80% 88% 96%

0%

5%

10%

15%

20%

25%

30%

35%

0% 8% 16% 24% 32% 40% 48% 56% 64% 72% 80% 92% 100%

The distribution of shares in the two first classes is approximately normal. Thethird class deviates slightly, the last one strongly from a normal distribution.

Improved author profiling through the use of citation classes, Leiden, 2014 10/18

Page 19: Improved author profiling through the use of citation classes

CSS and tradition citation indicators

Rank correlation between performance classes and citation indicators for data set 1

Class 1 Class 2 Class 3 Class 4 RCR NMCR

Class 2 -0.459Class 3 -0.743 0.058*Class 4 -0.685 -0.055* 0.485RCR -0.630 -0.001* 0.547 0.734NMCR -0.839 0.125 0.693 0.863 0.786NMCR/RCR -0.632 0.202 0.482 0.547 0.163 0.701

☛ These observations substantiates that citation behaviour cannotsufficiently be represented by one class or any individual indicatoralone.

This is a strong argument for the choice of this method with fourperformance classes at this aggregation level too.

Improved author profiling through the use of citation classes, Leiden, 2014 11/18

Page 20: Improved author profiling through the use of citation classes

CSS and tradition citation indicators

Rank correlation between performance classes and citation indicators for data set 1

Class 1 Class 2 Class 3 Class 4 RCR NMCR

Class 2 -0.459Class 3 -0.743 0.058*Class 4 -0.685 -0.055* 0.485RCR -0.630 -0.001* 0.547 0.734NMCR -0.839 0.125 0.693 0.863 0.786NMCR/RCR -0.632 0.202 0.482 0.547 0.163 0.701

☛ These observations substantiates that citation behaviour cannotsufficiently be represented by one class or any individual indicatoralone.

This is a strong argument for the choice of this method with fourperformance classes at this aggregation level too.

Improved author profiling through the use of citation classes, Leiden, 2014 11/18

Page 21: Improved author profiling through the use of citation classes

Author Profiling

Different profiles according to the deviation of the base-line distributioncan be defined.

⇒ As we have four different classes, a variety of possible deviationsmight occur.

⇒ Each author can thus be characterized by an individual profileindicating the distribution of his papers over the four performanceclasses.

⇒ The advantage of these profiles is that they enable a directcomparison between distinct authors but it may also be applied forcomparison with an appropriate reference distribution.

Improved author profiling through the use of citation classes, Leiden, 2014 12/18

Page 22: Improved author profiling through the use of citation classes

Author Profiling

A χ2 -test indicates whether or not an individual profile deviates from thereference at level p = 0.05.

• If H0 is rejected, the deviations in classes 1 and 4 are used to identifythe profile.

Five profile types and direction of deviation in classes 1 and 4

Profile χ2-test Class 1 Class 4 Authors

I * below below 211II * above below 579III 2603IV * below above 845V * above above 33

Improved author profiling through the use of citation classes, Leiden, 2014 13/18

Page 23: Improved author profiling through the use of citation classes

Author profiling

Average share of performance classes

0%

20%

40%

60%

80%

100%

Class 1 Class 2 Class 3 Class 4

I

II

III

IV

V

Improved author profiling through the use of citation classes, Leiden, 2014 14/18

Page 24: Improved author profiling through the use of citation classes

Author Profiling

Average citation indicators for each profile

Indicators I II III IV VAverage Publication 37.6 38.3 30.7 37.8 31.6Average RCR 1.33 0.88 1.28 2.22 3.39Average NMCR 2.06 0.85 1.80 4.36 5.87Average NMCR/RCR 1.62 0.99 1.44 2.01 1.49

Improved author profiling through the use of citation classes, Leiden, 2014 15/18

Page 25: Improved author profiling through the use of citation classes

Author profiling

Sample of 20 authorsAuthor Class 1 Class 2 Class 3 Class 4 RCR NMCR NMCR/RCR Type

1 31.8% 27.3% 36.4% 4.5% 1.32 2.01 1.53 I 2 39.3% 60.7% 0.0% 0.0% 1.13 1.24 1.09 I 3 32.5% 45.0% 22.5% 0.0% 0.98 1.59 1.61 I 4 40.0% 24.4% 33.3% 2.2% 1.36 1.65 1.21 I 5 67.5% 28.6% 1.3% 2.6% 1.30 1.01 0.77 II 6 58.3% 32.5% 7.3% 2.0% 1.00 1.18 1.19 II 7 74.2% 16.1% 9.7% 0.0% 1.21 0.75 0.62 II 8 72.0% 20.0% 8.0% 0.0% 0.65 0.82 1.27 II 9 68.2% 27.3% 0.0% 4.5% 0.82 0.79 0.96 III

10 33.3% 52.4% 9.5% 4.8% 1.56 1.82 1.17 III 11 61.9% 38.1% 0.0% 0.0% 0.70 0.80 1.15 III 12 33.3% 28.9% 20.0% 17.8% 1.89 3.35 1.77 III 13 21.4% 21.4% 14.3% 42.9% 2.65 5.85 2.21 IV 14 9.4% 50.0% 21.9% 18.8% 1.68 2.72 1.62 IV 15 17.9% 14.3% 28.6% 39.3% 3.21 6.34 1.97 IV 16 30.0% 20.0% 40.0% 10.0% 1.80 2.29 1.27 IV 17 56.8% 22.4% 11.2% 9.6% 1.47 1.83 1.24 V 18 62.5% 16.1% 12.5% 8.9% 1.38 1.92 1.39 V 19 47.6% 9.5% 9.5% 33.3% 14.58 35.46 2.43 V 20 50.0% 10.7% 14.3% 25.0% 3.75 3.69 0.98 V

Reference 44.3% 32.9% 14.5% 8.4%

Improved author profiling through the use of citation classes, Leiden, 2014 16/18

Page 26: Improved author profiling through the use of citation classes

Conclusions

This study confirms the general applicability of the CSS-method at theindividual level and to author profiling of candidates with scientificexcellence.

• The distribution of shares in the first two classes is close to normal

• At the tail we observed strong deviation

• The distibution over classes is only partially correlated withtraditional citation indicators.

• Five profile types can be identified. Presence/absence of publicationsin the outer classes compensates for the citation scores.

• Relevant reference standards have to be chosen for each applicationof the profiling method.

Improved author profiling through the use of citation classes, Leiden, 2014 17/18

Page 27: Improved author profiling through the use of citation classes

Thank you very much for your aention.