peter andras school of computing science newcastle university [email protected]

59
RESEARCH: METRICS, QUALITY, AND MANAGEMENT Peter Andras School of Computing Science Newcastle University [email protected]

Upload: jonathan-cameron

Post on 19-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

RESEARCH: METRICS, QUALITY, AND MANAGEMENT

Peter Andras

School of Computing Science

Newcastle University

[email protected]

Page 2: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Overview

Introduction Research metrics Science domains and evolution of

science Quality and performance Science management

2

Page 3: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Introduction – 1

RAE – performance and funding REF – metrics based new performance

measurement system (2014) Scientomerics, bibliometrics Other countries: Netherlands, Australia,

US

3

Page 4: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Introduction – 2

What do we measure by metrics? What can be measured by metrics? How are metrics related to performance

and quality? How can higher education and research

management use science metrics?

4

Page 5: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Origins of research metrics Derek De Solla Price – 1960s Measuring ‘big’ science – physics Measuring national science output –

papers, citations, patents Bibliometrics and scientometrics –

1970s – today Early: counts Recent: networks, distributions and

conditional counts

5

Page 6: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

What is measured – 1

Publication counts within a time window – at least a year, and usually within a range publication venues (conferences, journals, books)

Citation counts of counted publications – citation counts per year may grow for a while and the shrink to zero through some time

Links – co-authorship, co-citation, citing – Google page rank

6

Page 7: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

What is measured – 2 Indices:

h-index – n publications, each with at least n citations – max n

g-index – m publications, with m citations in average – max m

Journal metrics: Impact factor – average (expected) citation per paper for

a time window (2 years)Immediacy index – the ratio of papers in a journal, which

are cited within a yearCitation half-life – the median age of papers that are

cited currentlyJournal h-index

7

Page 8: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Metrics

Metrics in academia:IndividualGroupDepartmentalUniversity level

Larger integration implies more stability and less variability over (shorter) time in the metrics

8

Page 9: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Domain comparability

Different science domains have different practices of doing research, and publishing and discussing research results

E.g. computer science – software engineering and software applications – use predominantly conferences to present new results

E.g. biology – uses predominantly journals to present new results

9

Page 10: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Normalisation Considering most sciences the underlying

generic pattern of publication and citation metrics is very similar – e.g. all follow power law distributions

Normalised metrics:Leiden research metrics centre – raw metrics are

divided by domain specific parameters to make them comparable across subjects

Domain specific parameters are calculated by analysing domain specific journals and papers (from high impact multi-domain journals) – journal classification

10

Page 11: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Domain nature

Domain specific public opinionHow are new results presentedHow are they discussedHow are they referencedWhat are the primary and preferred venuesHow is reputation established and maintained

– for individuals, groups and institutesHow is the socio-dynamics of the domainWhat are the ways of ‘experimental’ validation

11

Page 12: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Distortion factors Review papers – very highly cited Methods papers – most highly cited Self-citations – important at low counts,

ignorable above medium counts (domain specific counts)

Networking – citation cliques Spinning – relatively many highly cited papers

contain high claims that are refuted in relatively short time

‘Authority’ – unjustified benefits from already established authority status

12

Page 13: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Variability and noise

The metrics data may be highly variable – especially at individual or small group level, and also if too short time periods are considered

Structural noise caused by the mechanisms of science – e.g. time to citations peak (domain specific)

13

Page 14: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Common research metrics Publication and citation counts considering a

time window Individual and group level h-index / g-index Patent counts and patent revenue value Research funding value Number of PhD students satisfying some quality

minimum (e.g. submission of thesis on time, publication of journal paper)

Science esteem evaluation – measure of long term reputation management and networking ability

14

Page 15: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Metrics and quality

How are metrics related to research quality of an individual / group / department / university ?

Are metrics purely retrospective or do they have predictive value as well with respect to research quality ?

15

Page 16: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Evolution of science Karl Popper – continual additions to science,

all additions being falsifiable and valid until falsified, revisions caused by falsification of previously accepted scientific knowledge

Thomas Kuhn – normal science and revolutionary science, normal science grows by many little additions filling gaps in the current scientific knowledge, revolutionary science reorganises science fundamentally and grows rapidly until matures into normal science

16

Page 17: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Progress in normal science Use existing knowledge to make highly

likely predictions Check the validity of such predictions by

appropriately designed ‘experimental’ tests

E.g. demonstrating a statement or a new minor theorem in a well-established branch of mathematics

17

Page 18: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Progress in revolutionary science Use existing knowledge to discover

fundamental inconsistencies or relationships about previously unquestioned or believed-to-be-unrelated pieces of scientific knowledge

See what cannot be seen by most others and re-shape the related science according to the new discovery

E.g. non-Euclidean geometry (Bolyai, Lobachevsky), alternative continuum hypothesis (Godel, Cohen), axiom of choice alternatives (Zermelo, Godel, Cohen)

18

Page 19: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Predictability of success – 1 Normal science:

New hypotheses are about highly predictable outcomes

The predictability of success is lower if the hypothesis is in an area where experimental validation of earlier hypotheses is shaky - these are at the boundary of the normal science, and avoided by success-hungry researchers

Most research is successful in filling a gap and adding new bits of knowledge to the science

Most research is published and is referenced by further publications – being more mainstream in the current science implies more expected citations

19

Page 20: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Predictability of success – 2 Revolutionary science

Most hypotheses turn out to be wrongSuccessful revolutionary science is built on large

volume of unsuccessful tries and takes very long time (see the time between Euclid and non-Euclidean geometry, or the time between axiomatic set theory and alternatives of the choice axiom)

May take long time to be noticed, due to established authorities and authoritative views of the science

High risk of achieving no success and being largely ignored for a good while even if success is achieved

Many times it does not lead to publications, citations and reputation enhancement

20

Page 21: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Extent of impact The expected impact is reversely correlated

with the predictability of success Normal science has impact – however spinning

may create the impression of higher impact that the actual; usually the impact is small, since it is about filling a usually small gap in the science

Revolutionary science has major impact, if it is successful, in this case it is likely to change (rewrite) a large part of the existing science or initiate a fundamentally new branch of science with rapid initial growth potential

21

Page 22: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Domain maturity Maturity is indicated by the proportion of normal science

progress within the science Mature science: most progress is made by small

additions following normal science research, there are rare and relatively marginal revolutionary events – e.g. physics

Early stage science: most progress follows the pattern of revolutionary science discovery, normal science progress happens for relatively short periods, and normal science components are regularly undermined by new revolutionary science, discovering the same several times is not uncommon, spinning may play an important role in establishing reputation – e.g. management science

22

Page 23: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Domain specific progress & impact

Progress and impact depends on the maturity of the science

More mature science implies more work following the pattern of normal science

More mature science also implies relatively little impact for most new discoveries

23

Page 24: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Domain public opinion – 1 Science maturity determines the nature of the

public opinion of the science Mature science is based on a stable and

unchallenged body of scientific knowledge, which allows relatively few major novelties, implies well established and standardised experimental validations, and implies relatively low impact for most discoveries

Implication: archival style publication in journals, use of highly standardised language, large integrated scientific societies, the use of spin to create the impression of revolutionary-ness and to achieve visibility

24

Page 25: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Domain public opinion – 2 Early stage or less mature science allows regular

‘revolutionary’ novelties, doesn’t have an unquestionable body of established knowledge, allows experimentation with validation methods, allows repeated re-discoveries, and implies relatively high impact for the truly successful revolutionary innovations

This environment may facilitate the more informal way of publication of results, also implies less citations for most results, which turn out to be less revolutionary than expected, the high impact may find its way on alternative routes – i.e. avoiding formal citations, spinning may play an important role

25

Page 26: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Measurability of normal science

26

Normal science produces large volume output in well-established venues (journals, conference proceedings)

Normal science papers are regularly cited by other such papers – papers in the currently most fashionable mainstream areas can get many citations

Judging the quality of usual normal science publications is relatively straightforward, spinning has usually short-term effects

Research grants are frequently available for normal science projects with high predictability objectives

Page 27: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Measurability of revolutionary science Revolutionary science output is rare, most ‘revolutionary’-intended

publications are rejected or end up in obscure journals or conferences, scientists working on revolutionary topics may refrain from publication of half-baked ideas

Revolutionary science papers may stay as a ‘sleeping beauty’ for many years, they may get few citations for a long time, and may not get many citations ever (e.g. if other follower papers collect most of the citations)

It is hard to judge the quality of revolutionary science work – it is out of the boundaries or at the boundaries of normal science, there are no standardised methods or well established referential knowledge base, the impact is highly unpredictable

A very large part of science intended to be revolutionary leads to dead-ends, although the rare truly revolutionary works have extremely high impact on their science

Few research grants are available for revolutionary science, due to its unpredictable success, and out-of-ordinary methods and objectives, getting such grants many times needs excessive spinning

Page 28: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Metrics measure normal science

28

Research metrics measure almost exclusively normal science Publications Citations Research funding PhD students

Page 29: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Measuring revolutionary science

29

It is very difficult to measure revolutionary science It becomes obvious that a work is part of revolutionary

science usually well after the work is published – 5 – 25 or more years later – depending on the science domain

Revolutionary science work is confirmed to be truly revolutionary at or close to the point when the new science knowledge based on this work becomes normal science

It is very difficult to measure the expected impact of scientific work meant to be revolutionary

Revolutionary science is not measured by usual research metrics (or at most it measured to very little extent before it becomes normal science)

Page 30: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Quality and metrics – 1

30

Quality in case of normal science can be evaluated and predicted reliably – the methods are standardised and relatively easy to check that they are followed correctly, the expected results are highly predictable, their impact is also well predictable

Research metrics are well correlated with research quality in case of normal science

Page 31: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Quality and metrics – 2 It is difficult to compose valid short & medium term metrics for

revolutionary science – long term metrics work in the sense of evaluating whether it led to emergence of new normal science

Quality is hard to assess because results and impact are not easily predictable – an indicator can be the extent of the gap that would be bridged across by the new intended discovery, but spinning may blur the reliability of the estimate of the extent of the gap

Quality may be judged by others who are active in the same area and have similarly wide range of knowledge, but even these judgements can be easily wrong

There is not much relation between any common research metric and the quality of revolutionary research and its output

Page 32: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Output and funding

32

In short-to-medium term (3-5 years): Funding by QR and research council grants (in the UK)

follows the performance in output generation (primarily citation counts) – significant correlation

Output generation (publication and citation counts) depends on earlier output performance (primarily citation counts) and much less on change in the level of research funding

Research funding performance reflects earlier output performance (primarily normal science)

Extra research funding does not necessarily imply extra output above the level predicted based on earlier output

Page 33: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Metrics for management

33

How can we use research metrics in the context of management ?

What kind of research metrics can be used ? What can be achieved by using metrics ?

Page 34: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Management objectives

Management objectives should be set in clear terms, should be achievable, and the progress towards achievement should be measurable

Metrics can help in formulating such research management objectives

Objectives should be set after sufficiently clear measurement and evaluation of the current status

34

Page 35: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Measuring progress and achievement – 1 Research metrics are key to set objectives with

measurable progress in normal science Normal science based long-term retrospective

measurement of revolutionary science work is the main way of measuring progress and achievement in revolutionary science

35

Page 36: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Measuring progress and achievement – 2 Measurement should consider a large part

of the spectrum of possible outcomes (in quality terms) at sufficient resolution (e.g. papers published in a wide range of venues, grant applications with all levels of rating)

The context and methods of measurement should be establish clearly and transparently (e.g. set of considered journals, conferences, time windows, exclusion rules)

36

Page 37: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Metrics as a management tool – 1 Individual metrics:

Publication and citation counts and distributions (yearly counting) for 3 / 5 / 7 years (depending on status) – aim increasing distribution graphs and preferably exponentially increasing citation graph

Discounted publication and citation metrics – elimination of reviews, editorials, database reports, excessively-many-author publications, etc. – aim similar as above

List of highly cited excluded publications for 3 / 5 / 7 years – aim depends on individual

h-index and/or g-index – with and without excluded publications – change in the index over the last 3 / 5 / 7 years – aim steady increase

Research grants – count and yearly distribution of applications, distribution of evaluations, amount and yearly distribution of awarded grants for the last 3 / 5 / 7 years – aim steady level or increase

For top-flyers – comparison with metrics results of a set of agreed top-flyers elsewhere – this applies to metrics that can be calculated using common and accessible data sources

37

Page 38: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Group & department metrics Same set as metrics for individuals, but counted for the group or

department Distributions of individual metrics descriptors (e.g. publication

and citation count) over the group or department members – aim balanced distribution with a set of high-flyers

Change in the above distributions over time – aim steady or improved balance, with a steady or increasing count of high-flyers

Cohesion metrics: member co-authorships of counted papers and grant applications and proportion of these within the whole measured sets – aim sufficient but not excessive co-working relations

Comparison with metrics results for a set of departments considered close competitors – this applies to metrics that can be calculated using common and accessible data sources

38

Metrics as a management tool – 2

Page 39: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

University metricsSame metrics as for groups & departmentsGroup distribution and cohesion metrics

should be calculated by considering departments instead of individuals

Comparison with metrics results for a set of universities considered close competitors – this applies to metrics that can be calculated using common and accessible data sources

39

Metrics as a management tool – 3

Page 40: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Setting targets

Targets are short term objectives – time limits/windows should be set very clearly

Should be clearly defined, based on evaluation of current standing, achievable, and clearly measurable

While targets apply to single years, they should be considered in the context of multi-year objectives

40

Page 41: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Domain specific impact

The impact of the work is domain specific

This should be considered when setting citation and related (e.g. h-index) targetsIndividualGroup & departmentUniversity

41

Page 42: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Clarity and predictability

It is critical to use metrics in management with clear setting of their context, methods, and evaluations

It is also critical to have predictable career-related outcomes associated with metrics-based management evaluationIt should be clear what gains/losses or

rewards/punishments are associated with delivering the target/objective and the non-delivery, sub-delivery or over-delivery of it

42

Page 43: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Environment

Work time allocations should consider explicitly the time related to work for the achievement of research performance targets

Metrics, evaluations, related gains/losses, should be generally accepted as valid

43

Page 44: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Competition and transparency

Competition in achievement should be openly supported

The performance evaluation process should be open and transparent

Individuals / groups / departments should be able to compare themselves with others on the basis of openly available timely data (considering normalisation due to science domain)

44

Page 45: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Hype and spin The requirement for hype and spin is induced

by the larger market of research results (e.g. publishing papers, getting patents)

Ideally hype and spin should be discounted in internal evaluations, but without damaging the enthusiasm of researchers who hype and spin their work

More successful and higher reputation department can afford stricter hype and spin discounting – the negative subjective effects of discounting are compensated by the shared reputation dividend

Page 46: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Choice: normal vs. revolutionary science A choice between the extent of the normal

and revolutionary science work should be made at all levels (individual, group, department, university)

Early stage: preference should be for normal science to establish reputation

Revolutionary science may be very exciting and challenging, but too much of it may lead to drop in or lack of reputation

46

Page 47: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Funding and impact expectation Funding and impact expectation

depends on the normal / revolutionary science mix

The normal science part makes more predictable the expectations, and also limits these expectations

The revolutionary science part makes less predictable the expectations, but may also imply very high longer term expectations

47

Page 48: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Unit size – 1 Revolutionary science implies high risk High risk is affordable only if it can be

spread sufficiently widely research department size implication

Small department may easily disappear by taking too much risk

Doing revolutionary science needs sufficiently large size department that can compensate for the risk by doing sufficient amount of low risk normal science work as well

Page 49: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Unit size – 2 Phase transition: revolutionary science needs

to be embedded in very large departments doing large amount of normal science as well / departments not doing revolutionary science need to be smaller, otherwise they have too much inefficiency (e.g. unable to provide sufficient career perspective for many members, too much overlap, too much hype-driven competition)

Transition between the two phases is possible

Page 50: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Managing normal science Managing normal science work is

possible using metrics Since most of the well-established

sciences contain mostly normal science this means that individuals / groups / departments / universities can use research metrics-based management to a large extent

50

Page 51: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Managing revolutionary science – 1 Revolutionary science is very difficult to be measured

and can be measured reliably only by assessing its normal science impact over long or very long term

Investment in revolutionary science is very risky, and most of it has no return

Managing revolutionary science work may be possible to some extent on the basis of domain knowledge of established leaders of the domain – they may choose more junior colleagues deemed able and ready to get on the tracks of revolutionary science – this is also a high risk decision for these younger scientists (they may end up with very little and they may realise this only after long time)

51

Page 52: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Managing revolutionary science – 2 Revolutionary science investment may be based

on long-term revolutionary science track record of a department or university, but should be noted that any such track record is not a guarantee of successful continuation – however consecutive generations of revolutionary scientists may indicate that at least the previous generations of them knew how to select their followers

Page 53: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

HR implications – normal science Mature science domains with sustained growth potential

need many (possibly increasingly many) researchers who deliver results for normal science progress

The ability distribution of the pool of potential researchers does not change in short time, implying that if more potential researchers find work as scientists, increasing number of less able researchers work on normal science topics, and the average ability of researchers working in the science domain is lowered

The increasing standardisation of normal science research compensates for the average ability drop, making possible to achieve good results for the less able researchers as well

53

Page 54: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

HR implications – revolutionary science – 1 As the body of normal science grows longer time

is required for a researcher to show proficiency in production of normal science, which is required for establishing the reputation

The benefits of doing normal science may attract many who in principle would be able to do more than that in terms of revolutionary science discoveries – as the science grows likelihood of finding such significant discoveries reduces, increasing the chance that the ability to discover revolutionary science will be wasted in the end

54

Page 55: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

HR implications –revolutionary science – 2 Researchers working on revolutionary topics may

become the funding and impact ballast of groups and departments, which may lead to pressure on them to switch for topics more rewarding in funding and impact terms

Finding the right candidates for doing revolutionary work becomes increasingly difficult as delivering results in normal science becomes easier and the spin-noise of pretended ‘revolutionary’-ness becomes louder

Page 56: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Summary – 1 Most of what can be measured to compile

research metrics belongs to normal science Revolutionary science work disappears in the flood

of normal science until around the time point, when it starts to turn into normal science

Research metrics can be used to measure quality in the context of normal science performance

In the case of revolutionary science performance the metrics work only in long term and retrospect

56

Page 57: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Summary – 2 Clarity, predictability, transparency, appropriate

hype discounting, and competition are critical for the management of research

Normal science is easier to practice, more researchers are available to do it, its impact and expected funding are more predictable than in the case of revolutionary science

Revolutionary science is increasingly riskier as the body of normal science grows, but its potential impact is also increasingly larger, while finding the right researchers to do it is increasingly difficult

57

Page 58: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Summary – 3 Normal science research is required in all

sciences, and to an increasing extent as the science matures

Having sufficient amount of good quality normal science research provides predictability and safety (in funding, impact and reputation terms) for a research department

Page 59: Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk

Summary – 4 To cultivate revolutionary science, the

management of the institution should make clear long term decisions about getting into such risky venture – size matters

The expectations about impact, funding and personnel should be clear and should be set in realistic time frames (depending on the science domain)

The associated risk of non-delivery, the lack of ability to monitor progress in any close sense, and the autonomous funding requirements should be clear and acceptable for the management

59