publication potential—an indicator of scientific strength for cross-national comparisons

Scientometrics, Vol. 9, Nos 5 -6 (1986)231-238

PUBLICATION POTENTIAL-AN INDICATOR OF SCIENTIFIC STRENGTH FOR

CROSS-NATIONAL COMPARISONS

A. SCHUBERT, A. TELCS

Information Science and Scientometrics Research Unit, Library of the Hungarian Academy o f Sciences, It-1361, P. O. Box 7, Budapest (Hungary)

(Received April 14, 1985 in revised form June 5, 1985)

A new indicator, called the publication potential, is proposed to measure scientific strength of different countries. The indicator is based on SC1 author counts and publication frequency distributions, Not depending on national statistical reports, it avoids the ambiguities of statistical definitions and methods, thereby providing a solid ground for cross-national comparisons. Publication based and statistical survey data for 34 countries axe compared and some of the most conspicuous discrepancies are pinpointed.

Introduction

Science manpower seems to be a natural measure of the scientific strength o f different countries. Indeed, it is one of the main indicators in almost every national

and international collection of science statistics. However, a clo~er look at these

figures convince us that, although they might be rather useful in establishing within-

country trends, they are quite useless in cross-country comparisons. This is

because, in spite of the great efforts exerted by Unesco, OECD and other organiza-

tions toward standardization of statistical definitions and methods, "national

statistical practices and concepts are not necessarily designed for the specific

requirements of international comparisons" - in the diplomatic style o f the Unesco

survey. 1 Discrepancies are not confined to small or less developed countries. For

example for the year 1979, Unesco 1 reports 26.1 thousand researchers in the

Netherlands, 13.9 thousands in Belgium, 6.0 thousand in Denmark; for the very

same year, OECD 2 reports 18.3 thousand researchers in the Netherlands, 10.9

thousand in Belgium and 6.0 thousand in Denmark. Data for Japan are as follows:

Unesco 418.0 thousand, OECD 363.5 thousand, while in National Science Founda- tion's Science Indicators a the corresponding value is 281.9 thousand. These data

certainly show the need for other manpower estimates.

Scientometrics 9 (1986J Elsevier, Amsterdam-Oxford-New York Akaddmiai Kiad6, Budapest

A. SCHUBERT, A. TELCS: PUBLICATION POTENTIAL

To avoid depending on national statistical reports in building science indicators,

one has to find some independent data base containing relevant information on the world's scientific activity. One of the most promising attempts is the use of publica-

tion data bases, such as the Science Otation Index (SCI) of the Institute for Scientific Information (ISI, Philadelphia, PA, USA). Price, 4 Spiegel-ROsing, s Inhaber, 6 and Kovach 7 successfully used SCI author counts as a proxy for scientific manpower in different countries. This approach, however, has its own problems.

SCI's journal coverage is not free from geographic and language biasses. It is claimed that there is, in particular, an underrepresentation of Latin-American and Soviet research as welt as a lack of some French, German and other national language journals. 8

Our present paper gives credit to the positive experiences of the above authors regarding the use of the SC1 data base and, using the same source, attempts - at least

partly - to correct the biasses encountered. Previous SCI based manpower studies 4 - 7 used author counts only. Authors

(more specifically, authors in a given, not too large, time period) represent,

however, only a fraction of the total scientific manpower, the visible tip of the

iceberg, as it were. This fraction may, moreover, vary from country to country due

to the biasses in the SC1 coverage. Another dimension of publication analysis, so far unused in such kind of studies, can be obtained by combining author counts with

publication frequency counts. Instead of a single number representing the number of authors we will have then a series of numbers representing the number of

authors publishing 1,2,3 . . . . etc. papers in the given period. If one were able to extrapolate this series to include "the number of authors publishing zero papers",

then the complete set of "potential authors" or, in other words, the "publication potential" of the given country could be assessed. Such an extrapolation must be

based on an adequate theoretical model of the frequency distribution of scientific publication productivity. A large stock of such models is available in the literature stemming from Lotka's pioneering work; 9 in Vlach~'s bibliography 10 437 papers

on related topics were enumerated. An early attempt of zero extrapolation was made by Mantell 11 using a Poisson model. Recently, the Waring distribution was

found to be a most suitable model of publication productivity in different science fields.12,13 Moreover, the Waring model was proved to provide excellent scientific manpower estimates for the US states, 14 where fairly reliable and comparable control data were available. These results encouraged us to extend our study to obtain a cross-national survey of scientific manpower estimates.

232 Scientometrics 9 (1986)


Theory and methods

The Waring distribution with parameters ~ > 1 and N > 0 can be defined by the recursion relation

c~ N + k - 1 P o - - - ; P k - - - P k - 1 , (k=1,2 . . . . ) (1)

a + N N + k + a

Its expectation is

N Eo = - - (2)

o r - - 1

The denotation E o for the expectation has the following motive. Let us denote the expectation of a distribution truncated from the left at s by Es, in other words, Es

is the conditional expectation E(k] k >1 s). The Waring distribution has the following characteristic property: is ,16 Es of a non-negative integer valued discrete distribution is a linear functior~ of s if and only if the distribution is a Waring distribution, namely,

Ot E~ = e o + - - s . ( 3 )

a - 1

Eq. (3) can readily be used in analysing empirical samples. In these applications, the sample mean of the s-truncated sample:

4 es =k~=s knk n k (4)

can be regarded as an estimator of Es. Here, n k is the k-th absolute frequency, i.e., in the case of publication productivity, the number of authors having exactly k papers; thereupon E s is the mean productivity of authors having not less than s papers.

By plotting es versus s a straight line should be obtained.

$cientometrics 9 (1986) 233


This allows: (1) a simple graphical test of goodness of fit; (2) estimation of-the parameters of the distribution from the slope and the inter-

cept of the straight line, Details of the proper methods of fitting the straight line were published elsewhere. 14 In practice, the series ex,e~ . . . . cLn directly be Calculated from the sample and

the intercept of the fitted straight line provides an estimator, eo, of Eo. Since, by definition,

eo =s / T, (5)

where S is the number of publications, T is the publication potential (number of "authors" including those publishingCzero papers in the given period), an estimator, T, of T can be given as

7 ~ = S / eo , (6)

where both S and ~o can be determined from the sample~

234

es

15-

5 10 Z - r = 0.968

. / / ? =149'103 ~o =0.595

5 / ~=5/~0: 25.0.103

I I I .- 5 10 15

Fig. 1. Plot ofe s versus s for Australia (1978-79)

Scientometrics 9 (1986)

0.595 0


A typical es versus s plot (data of Australia) is presented in Fig. 1~ Outlines of the calculations are also indicated in the figure. The goodness of fit is testified by

the fairly high value of the correlation coefficient (r=0.968).

Data sources

Publication data were obtained from the SC1 data base. Magnetic tapes of the two- year period 1978-79 were processed. Papers were assigned to different countries according to the mail address of their first author, as indicated in the byline of the publication. To maintain the statistical reliability of the results, only countries publishing more than 1000 papers in the two-year period were included in the study.

Whereever available, Unesco 1 and OECD 2 estimates of scientific manpower were

also considered for comparison.

Results and discussion

Author and publication counts, estimated publication potentials, and Unesco and OECD scientific manpower estimates are presented in Table 1.

Two ratios of particular interest can be formed from the tabulated data. By comparing the estimated publication potential to the Unesco and/or OECD estimates, some of the earlier mentioned definitional ambiguities of national statistics can be clearly pinpointed. There are ten countries in which the estimated publication potential is less than half of the Unesco manpower estimate. These are: Bulgaria, Czechoslovakia, Egypt, the German DR, Hungary, Japan, Poland, Rumania, the USSR, and Yugoslavia. It can thus be established that the official statistics of the East-European countries, without exception, highly overestimate their scientific manpower (at least in the case of Hungary, the authors are convinced that the publication potential estimate is much more realistic) and so does Egypt.

The case of Japan is more complex. On one hand, there certainly are some controversies about the definition of researchers, as testified by the discrepancies between the Unesco, OECD, and NSF data mentioned in the Introduction. On the other hand, Japanese author counts are very unreliable because of the extremely high frequency of homonymy. (In other countries, homonymy is statistically balanced by variants of orthography of other names.) This is supported by the unlikely high Japanese publication/author ratio. At the opposite end of the scale, there are Brazil and Canada, whose estimated publication potential is more than double the Unesco estimate. In case of Brazil, our estimation procedure yields a relatively high uncertainty due to the extremely high proportion of the "zero-class". Nevertheless, the results suggest that, in contrast to the East-European countries, Brazilian Unesco manpower data represent (or, even, underrepresent) a real reserve

Scientometrics 9 (1986) 235


Table 1 Publication and survey based estimates of scientific manpower

(in thousands)

Author Publication Publication Unesco OECD Country count count potential estimate estimate

Australia 8.5 14.9 25.0 22.5 22.3 Austria 2.1 4.0 5.4 5.4 5.4 Belgium 3.4 5.9 16.3 13.9 10.9 Brazil 1.6 2.3 49.4 24.0 **** Bulgaria 1.2 1.9 4.8 35.9 **** Canada 16.5 29.3 53.6 26.2 26.3 Czechoslovakia 3.5 6.2 10.5 51.7 **** Denmark 2.9 5.7 6.7 6.0 6.0 Egypt 1.0 1.7 4.0 10.7 *** * Finland 2.0 4.0 4.7 8.4 7.4 France 20.4 36.0 58.4 72.9 72.9 German DR 3.8 6.6 14.3 110.5 **** Germany FR 22.5 43.6 68.6 122.0 122.0 Greece 0.7 1.1 5.3 2.6 4.3 Hungary 2.2 3.9 7.2 25.3 *** * India 9~ 21.6 29.7 28,2 **** Ireland 0.8 1.3 4.3 2.6 2.6 Israel 3.7 7.0 9.1 14.7 **** Italy 7.4 13.2 21.7 40.8 46.4 Japan 12.6 39.2 22.8 418.0 363.5 Netherlands 6.1 10.1 17.4 26.1 18.3 New Zealand 2.0 3.3 7.4 3.7 8.1 Nigeria 0.7 1.2 2.8 2.2 **** Norway 1.9 3.6 4.6 6.5 7.1 Poland 4.7 8.2 18.2 92.8 **** Rumania 0.8 1.5 3.5 **** **** South Africa 2.2 4,8 8.8 **** **** Spain 2.1 3.5 11.4 7.9 8i7 Sweden 5.3 10.7 12.4 14.8 14.8 Switzerland 5.5 9.2 21.6 16.4 10.7 UK 35,3 63.6 108.2 122.0 104.0 USA 153.4 255.4 626.6 637.3 621.0 USSR 33.4 56.8 201.8 1340.6 **** Yugoslavia 1.1 1.7 7.6 22.4 22.4

of scientific potential, o f which only a small fraction appears in the SCl-covered

literature. (Comparison of Brazilian and Bulgarian data in Table 1 is rather instructive

in this respect.) In Canada, publication data make obvious that both Unesco and

OECD data underestimate the true number o f active researchers. The reason for this

fact is unclear to us.



The other characteristic ratio is that of author count to publication potential.

This ratio measures in what extent the publication potential of a country is actuaUy

manifests itself in the form of authors in the SC1 covered journals. The smaller this

ratio, the greater is the "silent" fraction of the national scientist population. This

fraction can be considered, on the other hand, a pool of prospective "newcomer"

authors. And indeed, Fig. 2 ~ demonstrates the correlation between the fraction of

0 . 6 -

E

C

0 . 5 -

f -

.g 8 u_ 0 4 -

0.3

�9 BRA

YUeG BGR eESP e~DDR

GRC IRL EGY eNZL �9 sun BE, .ZAF

~ CsKNLD ROWlo 't'AUS oPOL OIND

CANe. OlTA HUN" IIDEU

GBR ~RA

AUToNoR

FT. "DNK sWE

r=-0.842 3PN �9

I I I ! I . .

020 0.1 0.2 0.3 0.4 Q5 Author count per publication potential

Fig. 2. Fraction of newcomers in authors versus author count/publication potential ratio. Country codes: AUS Australia, AUT Austria, BEL Belgium, BGR Bulgaria, BRA Brazil, CAN Canada, CHE Switzerland, CSK Czechoslovakia, DDR German DR, DEU Germany FR, DNK Den- mark, EGY Egyipt, ESP Spain, FIN Finland, FRA France, GBR United Kingdom, GRE Greece, HUN Hungary, IND India, IRL Ireland, ISR Israel, ITA Italy, JPN Japan, NGA Nigeria, NLD Netherlands, NOR Norway, NZL New-Zealand, POL Poland, ROM Rumania, SUN USSR, SWE Sweden, USA USA, YUG Yugoslavia, ZAF South Africa

"newcomer" SCI authors in 1979 (names not occurring in any of the three previous

years) and the authors/publication potential ratio.

Although the values for Brazil and Japan are dubious because of the above mentioned reasons (homonymy, o f course, results also in an unreasonably low frac-

tion of apparent newcomers: a lot o f them shares his/her names with older colleagues), even without both of them the tendency is undeniably clear.

Scientometrics 9 (1986) 237


We may conclude that the estimated publication potential seems to bear real

significance and may be considered a valid indicator of scientific strength of different countries.

References

1. Uflesco, Division of Statistics on Science and Technology-Office of Statistics, Current Surveys and Research in Statistics, Statistics on Science and Technology, Unesco, Paris, 1982.

2. OEC'D, Science and Technology Indicators, Resources Devoted to R&D, OECD, Paris, 1984. 3. National Science Board, National Science Foundation, Science Indicators 1982. An Analysis of

the State o f U. S~ Science, Engineering, and Technology, Government Printing Office, Washington, D. C., 1983.

4. D. de SOLLA PRICE, Little ~Science-Big Science, Columbia University Press, New York and London, 4th ed., 1971.

5. I. S. SPIEGEL-ROSING, Journal auth6rs as an indicator of scientific manpower: A methodologi- cal study using data for the two Germanies anti'Europe, Science Studies, 2 (1972) 337.

6. H. INHABER, Distribution of world science, Geoforum, 6 (1976) 231. 7. E. G. KOVACH, Country trends in scientific productivity, In: ISFs Who Is Publishing in

Science. An International Directory of Scientists and Scholars in the Life, Physical, Social, and Applied Sciences, Institute for Scientific Information, Philadelphia, PA, 1978, pp. 33-40.

�9 \ . �9 .

8. F. NARIN, P. CARPENTER, The adequacy of the Science Citation Index (SCI) as an indicator of international scientific activity, Journal o f the American Society for Information Science, 32 (1981) 430. ~

9. A. J. LOTKA, The frequency distribution of scientific productivity, Journal o f the Washington Academy of, Sciences, 16 (1926) 317.

10. J. VLACHY, Frequency distributions of scientific performance. A bibliography of Lotka's law and related phenomena, Scientometrics, 1 ( 1978) 109.

11. L. H. MANTELL, On laws of special abilities and the production of scientific literature, Annals o f Documentation, 17 (1966) 8.

12. A. SCHUBERT, W. GL~.NZEL, A dynamic look at a class of skew distributions. A haodel with sciantometric applications, Scientometries, 6 (1984) 149.

13. A. SCHUBERT, W. GL~.NZEL, Prompt nuclear analysis literature: A cumulative advantage approach, Journal o f Radioanalytical and Nuclear Chemistry, Articles, 82/1 (1984) 215.

14. A. SCHUBERT, A. TELCS, Estimation of publication potential in 51 US states based on the frequency distribution of scientific productivity, Journal of the American Society for Information Science, tobe published.

15. W; GL~NZEL, A. TELCS, A. SCHUBERT, Characterization by truncated moments and its application to pearson-type distributions, Zeitschrift far Warscheinlichkeitstheorie und ver- wandte Gebiete, 66 (1984) 173.

16. A. TELCS, W. GJ~IZEL, A. SCHUBERT, Charactetization and statistical test using truncated expectations for a class of Skew distributions, Mathematical Social Sciences, 10 (1985) 169.


publication potential—an indicator of scientific strength for cross-national comparisons

Documents