evaluation in computer science.diaz/transpes/geneve.pdf · the case of computer science computer...

Evaluation in ComputerScience.

J. D́ıaz (UPC)

Why to evaluate in Academia?

I Hiring

I Promotion

I Grant evaluation

Whom to evaluate in Academia?

I Institutions

I Research Groups

I Individual Scientists

I Individual Papers

I Journals

Whom to evaluate in Academia?

Institutions⇑

Research Groups.⇑

Individual Scientists⇑

Individual Papers⇑

Journals

How to evaluate?

Research is too important to rely on subjectivejudgments.

Evaluation must be:

I Objective

I Simple

I Easy to implement and implement

I Minimal time consuming

How to evaluate?

Used to be Peer Review, tends towards Bibliometrics asprimary source.

Is that the correct way?

Research is too important to measureits value using bibliometrics alone

Bibliometrics

Main and most used:

I ISI-web-of-knowledge (Thomson-Reuters) (subscription based)Evaluates journals, papers appeared in a DB of 9000peer-reviewed journals, and researcher productivity forpublications in journals, in all fields of science.It uses their own DB.

I Scopus (Elsevier) (subscription based)Indexes 16000 peer-reviewed journals in most fields of science.

I Google Scholar (free)Web search engine that retrieves information for all accessibledocuments in the web. It includes all documents available inthe web.Prone to all kinds of errors: (use of Tech. Reports,problems with non-english names, ...)

Bibliometrics

I CiteSeer (CiteSeerX (β)) (PennState) (free).It crawls open access DB, as DBLP. Evaluates and usesjournal, conferences and books.Focuses on Computer and Information Science.Similar problems to Google Scholar.

Other bibliometric search engines: Scirus, getCITED, etc...

Evaluation of Journals

Impact Factor (IF): (Garfield Factor)The IF of J in 2008= number of references during 2008 to paperspublished in J during the t previous years, divided by the totalnumber of papers published in J during the same period.

In Garfield proposal h = 2, which is good for biological and medicaljournals, but it is equivocal for CS and Math.Scopus: uses only journals and conferences after 1994.ISI-web-of-knowledge: uses only journals and a 5 year period.Recently, started to incorporate conferences.Google-Scholar: uses Journals, Conferences and Technical Reports.Counts self-references.

Evaluation of Journals

Impact Factor (IF): (Garfield Factor)The IF of J in 2008= number of references during 2008 to paperspublished in J during the t previous years, divided by the totalnumber of papers published in J during the same period.In Garfield proposal h = 2, which is good for biological and medicaljournals, but it is equivocal for CS and Math.Scopus: uses only journals and conferences after 1994.ISI-web-of-knowledge: uses only journals and a 5 year period.Recently, started to incorporate conferences.Google-Scholar: uses Journals, Conferences and Technical Reports.Counts self-references.

Evaluation of Researcher

Based on counting number of citations of the researcherThe Hirsch number (H-index): A researcher has H-index=h if h ofhis papers are cited at least each h times each.

5 publications with more or equal to 5 cites

papers

cites

h=5

h5

Jorge Hirsh gives empirical evidence that all Nobel Prizes havelarge H-index.

Today’s trend is to use basically the H-index to evaluate CV atUniversities and Research Institutions (tenures, promotions,...) andeven to rate Universities

Evaluation of Researcher

Computation of H-index:ISI-web-of-knowledge:ScopusScholar Index (Google Scholar)

g-index, AW-index (age-weighted citation rate), etc..

Publish or Perish (uses Google Scholar as underling support)

A large body of literature has been developed that explains thedangers of using the described bibliometrics for evaluation ofJournals, Researchers and Institutions.

Show me the data M. Rossner, H. Van Epps , E. Hill.J. Cell Biology, 179, (6), pp. 1091’1092, (2007).

Citation Statistics. Adler, Erwing, Taylor . Joint Report from(IMU) (ICIAM) (IMS). 6/11/2008.

The case of Computer ScienceComputer science has as much to do with computers asastronomy has to do with telescopes.

Edsger Dijkstra

Computer Science is the only discipline that cannot bedefined in a single sentence.

Christos Papadimitriou

Biological Sc.

CS

Physics

Electrical Eng.

Operation Research

Economics

Mathematics

Social Sciences

The case of Computer Science: Peculiarities

The format and relevance of conferences in CS:Big difference with prestigious conferences in related fieldsMathematics, Physics, etc.The heterogeneity of the field: Informatics is science andengineering.Could be analytical and experimentalSometimes, it is difficult for people working in one area tounderstand work done in other areas of CS. For ex. people workingin Algorithmics and in Algorithmic Engineering.This has created small clusters of work with differentmethodologies and goals.

Conferences in CSI IJCAI (Art. Intell.): 16%I ECAI (Art. Intell.): 26%I ICSE (Soft. Engineer.): 13%I ICDCS (Distributed Comp.): 15%I OOPSLA (Object Technology): 19%I POPL (Prog. Lang.): 19%I AAMAS (Agents): 22%I ICCAD (Graphics): 26%I ICML (Mach. Learning): 27%I ICRA (Robotics): 43%I ICDM (Data Mining): 20%I CVPR (Vision- Recognition): 4%I INFOCOM (Computer Communicat.): 17%I DAC (Design autom.): 24%I IPDPS (Parallel-Dist.): 26%I STOC (Theory): 25%I SODA (Theory): 29%I ICALP (Theory): 33%

Conferences in CS

For CS, conferences should be fully introduced into the DB

Scopus, Google already do, ISI-web-kn is starting to do it.BUT

I Proliferation of conferences exchanging payment andacceptance (ex. WSEAS series)

I Even in the best conferences, plenty of wrong papers:Too many papers to handle by the same PC member in shorttime (use students as referees).For most papers, it is difficult to fit all details in 10 pages.

Conferences in CS

For CS, conferences should be fully introduced into the DBScopus, Google already do, ISI-web-kn is starting to do it.BUT

I Proliferation of conferences exchanging payment andacceptance (ex. WSEAS series)

I Even in the best conferences, plenty of wrong papers:Too many papers to handle by the same PC member in shorttime (use students as referees).For most papers, it is difficult to fit all details in 10 pages.

Example: A. Broder in a STOC-98 paper, used Markov chainMonte-Carlo methods to approximate hard counting problems.

M. Mihail in IPL-99, proved a key step in Broder’s paper was wrongM. Jerrum and A. Sinclair in FOCS-99 fixed the convergence usinga completely different proof.I consider that the paper of A. Broder, even wrong, is one of themost important contributions to TCS in the last 10 yearsPapers in good conferences must be considered mainly by theirnovel ideas. Most of the papers make it into long journal versions.

Example: A. Broder in a STOC-98 paper, used Markov chainMonte-Carlo methods to approximate hard counting problems.M. Mihail in IPL-99, proved a key step in Broder’s paper was wrong

M. Jerrum and A. Sinclair in FOCS-99 fixed the convergence usinga completely different proof.I consider that the paper of A. Broder, even wrong, is one of themost important contributions to TCS in the last 10 yearsPapers in good conferences must be considered mainly by theirnovel ideas. Most of the papers make it into long journal versions.

Example: A. Broder in a STOC-98 paper, used Markov chainMonte-Carlo methods to approximate hard counting problems.M. Mihail in IPL-99, proved a key step in Broder’s paper was wrongM. Jerrum and A. Sinclair in FOCS-99 fixed the convergence usinga completely different proof.

I consider that the paper of A. Broder, even wrong, is one of themost important contributions to TCS in the last 10 yearsPapers in good conferences must be considered mainly by theirnovel ideas. Most of the papers make it into long journal versions.

Example: A. Broder in a STOC-98 paper, used Markov chainMonte-Carlo methods to approximate hard counting problems.M. Mihail in IPL-99, proved a key step in Broder’s paper was wrongM. Jerrum and A. Sinclair in FOCS-99 fixed the convergence usinga completely different proof.I consider that the paper of A. Broder, even wrong, is one of themost important contributions to TCS in the last 10 years

Papers in good conferences must be considered mainly by theirnovel ideas. Most of the papers make it into long journal versions.

Example: A. Broder in a STOC-98 paper, used Markov chainMonte-Carlo methods to approximate hard counting problems.M. Mihail in IPL-99, proved a key step in Broder’s paper was wrongM. Jerrum and A. Sinclair in FOCS-99 fixed the convergence usinga completely different proof.I consider that the paper of A. Broder, even wrong, is one of themost important contributions to TCS in the last 10 yearsPapers in good conferences must be considered mainly by theirnovel ideas. Most of the papers make it into long journal versions.

The IF in ”Theory” CS Journals

The IF of 3 main Theory CS JournalsData from SCOPUS on 3 Theory Journals: JACM, SIAMJC, TCS

TCS

0

1000

2000

3000

4000

5000

6000

2000 2001 2002 2003 2004 2005 2006 2007 2008

SIAM J Comp

JACM

600

02000 2001 2002 2003 2004 2005 2006 2007 2008

SIAM J Comp

JACM

TCS

40

80

120

Number Papers Published

160

300

The IF of CS Journals

Evaluation of the individual research in CS

The list of the 20 highest H-number for CS scholars (JensPalsberg):(89) Hector Garcia-Molina (Stanford), (87) Jeffrey D. Ullman(Stanford), (82) Robert Tarjan (Princeton), (80) Deborah Estrin(UCLA), (79) C. Papadimitriou (Berkeley), (77) Don Towsley (UMass, Amherst), (73) Ian Foster (Argonne N. L.), (71) ScottShenker (Berkeley), (70) David Culler (Berkeley), (70) DavidHaussler (Santa Cruz) (68) Takeo Kanade (CMU), (61) MarioGerla (UCLA), (61) Nick Jennings (U Southampton), (58) LucaCardelli (Microsoft), (58) Anil K. Jain (Michigan S U), (57) MartinAbadi ( Microsoft), (57) Sushil Jajodia (George Mason U), (57)Leslie Lamport (Microsoft), (57) Demetri Terzopoulos (UCLA),(56) Randy H. Katz (Berkeley),

Turing awards

2007 Clarke-Emerson-Sifakis 1993 Hartmanis-Stearns 1979 K. Iverson

2006 F.Allen 1992 B.W.Lampson 1978 R. Floydv

2005 P. Naur 1991 A.J.Milnerv 1977 J. Backus

2004 Cerf-Kahn 1990 F.Corbato 1976 Rabin-Scott

2003 A.Kay 1989 W.Kahan 1975 Newell-Simon

2002 Adleman-Rivest-Shamir 1988 I.Sutherland 1974 D. Knuth

2001 Dahl-Nygaard 1987 J.Cocke 1973 C.W.Bachman

2000 A.Yao 1986 Hopcroft-Tarjan 1972 E.W.Dijkstra

1999 F.P.Brooks 1985 RM. Karp 1971 J. McCarthy

1998 Jim Gray 1984 N.Wirth 1970 J. Wilkinson

1997 D.Engelbart 1983 Ritchie-Thompson 1969 M. Minsky

1996 A. Pnueli 1982 S.Cook 1968 R. Hamming

1995 M. Blum 1981 E.F.Codd 1967 M.V.Wilkes

1994 Feigenbaum-Reddy 1980 CAR.Hoare 1966 A.J. Perlis

The IF of ”general” CS researchers

Person X H-index:ISI-web-of-knowledge: 48Scopus: 5Scholar: 42

X =

Personal Conclusions 1:

On Journals and Conferences

I The IF is useful as a first approximation.

I DB should be pruned of self-references.

I Editorial Boards should be professionalizedEd. Boards members should receive a fee for the jobReferees should receive an honorarium for a good jobStrict honorability rules should be implemented for behavior inEditorial Boards and Program Committees.

I There should be an IF for good CS conferences

I The expertise of the people working in a specific field, is themost important measure of evaluation for a given Journal andConference (assuming the honesty of the researcher!)

Personal Conclusions 2:On Research Evaluation

I Bibliometrics as the H-number are good as a lower bound onthe capacity of researchers, but it is pretty much helplessevaluating the research abilities above a certain threshold.

I Parameters as the H-number could be improved by returningto computing it in an ad-hoc manner (difficult)

I In Bibliometrics should incorporate a ”negative” score forfacts as excessive self-referencing and plagiarism (two of themodern plagues affecting CS research). This is best done bycareful interpretation of the existing DATA.

I In CS recognition of seminal ideas could take a long time.I The fashion for publishing as fast as possible and as much as

possible should be reversed in CS. In that sense, theevaluation must focus in: novel ideas, steps forward, noveltechniques, etc.A ”wrong paper” with interesting ideas that stimulate plentyof researchers is more important for the progress of CS than apaper who nobody cares even to validate its results.

I The design of good useful software should be incorporatedinto the evaluation of researchers in CS.

Personal Conclusions 3:

On the Researchers

I The fashion of publishing as fast as possible and as much aspossible should be reversed in CS. In the long run, a few goodideas will be better for your reputation than 100 irrelevantpublications.

I As John McCarthy says: Excessive self reference is often asdangerous as smoking.

I Plagiarism and self-plagiarism is the most dangerous game forthe self reputation.

I Concurrent multitasking of Editorial Boards and ProgramCommittees is dangerous for the fairness of the process.

Thank you.

evaluation in computer science.diaz/transpes/geneve.pdf · the case of computer science computer...

Documents