evaluation in computer science.diaz/transpes/geneve.pdf · the case of computer science computer...
TRANSCRIPT
Evaluation in ComputerScience.
J. D́ıaz (UPC)
Why to evaluate in Academia?
I Hiring
I Promotion
I Grant evaluation
Whom to evaluate in Academia?
I Institutions
I Research Groups
I Individual Scientists
I Individual Papers
I Journals
Whom to evaluate in Academia?
Institutions⇑
Research Groups.⇑
Individual Scientists⇑
Individual Papers⇑
Journals
How to evaluate?
Research is too important to rely on subjectivejudgments.
Evaluation must be:
I Objective
I Simple
I Easy to implement and implement
I Minimal time consuming
How to evaluate?
Research is too important to rely on subjectivejudgments.
Evaluation must be:
I Objective
I Simple
I Easy to implement and implement
I Minimal time consuming
How to evaluate?
Used to be Peer Review, tends towards Bibliometrics asprimary source.
Is that the correct way?
Research is too important to measureits value using bibliometrics alone
How to evaluate?
Used to be Peer Review, tends towards Bibliometrics asprimary source.
Is that the correct way?
Research is too important to measureits value using bibliometrics alone
Bibliometrics
Main and most used:
I ISI-web-of-knowledge (Thomson-Reuters) (subscription based)Evaluates journals, papers appeared in a DB of 9000peer-reviewed journals, and researcher productivity forpublications in journals, in all fields of science.It uses their own DB.
I Scopus (Elsevier) (subscription based)Indexes 16000 peer-reviewed journals in most fields of science.
I Google Scholar (free)Web search engine that retrieves information for all accessibledocuments in the web. It includes all documents available inthe web.Prone to all kinds of errors: (use of Tech. Reports,problems with non-english names, ...)
Bibliometrics
I CiteSeer (CiteSeerX (β)) (PennState) (free).It crawls open access DB, as DBLP. Evaluates and usesjournal, conferences and books.Focuses on Computer and Information Science.Similar problems to Google Scholar.
Other bibliometric search engines: Scirus, getCITED, etc...
Bibliometrics
I CiteSeer (CiteSeerX (β)) (PennState) (free).It crawls open access DB, as DBLP. Evaluates and usesjournal, conferences and books.Focuses on Computer and Information Science.Similar problems to Google Scholar.
Other bibliometric search engines: Scirus, getCITED, etc...
Evaluation of Journals
Impact Factor (IF): (Garfield Factor)The IF of J in 2008= number of references during 2008 to paperspublished in J during the t previous years, divided by the totalnumber of papers published in J during the same period.
In Garfield proposal h = 2, which is good for biological and medicaljournals, but it is equivocal for CS and Math.Scopus: uses only journals and conferences after 1994.ISI-web-of-knowledge: uses only journals and a 5 year period.Recently, started to incorporate conferences.Google-Scholar: uses Journals, Conferences and Technical Reports.Counts self-references.
Evaluation of Journals
Impact Factor (IF): (Garfield Factor)The IF of J in 2008= number of references during 2008 to paperspublished in J during the t previous years, divided by the totalnumber of papers published in J during the same period.In Garfield proposal h = 2, which is good for biological and medicaljournals, but it is equivocal for CS and Math.Scopus: uses only journals and conferences after 1994.ISI-web-of-knowledge: uses only journals and a 5 year period.Recently, started to incorporate conferences.Google-Scholar: uses Journals, Conferences and Technical Reports.Counts self-references.
Evaluation of Researcher
Based on counting number of citations of the researcherThe Hirsch number (H-index): A researcher has H-index=h if h ofhis papers are cited at least each h times each.
5 publications with more or equal to 5 cites
papers
cites
h=5
h5
Jorge Hirsh gives empirical evidence that all Nobel Prizes havelarge H-index.
Today’s trend is to use basically the H-index to evaluate CV atUniversities and Research Institutions (tenures, promotions,...) andeven to rate Universities
Evaluation of Researcher
Computation of H-index:ISI-web-of-knowledge:ScopusScholar Index (Google Scholar)
g-index, AW-index (age-weighted citation rate), etc..
Publish or Perish (uses Google Scholar as underling support)
Evaluation of Researcher
Computation of H-index:ISI-web-of-knowledge:ScopusScholar Index (Google Scholar)
g-index, AW-index (age-weighted citation rate), etc..
Publish or Perish (uses Google Scholar as underling support)
A large body of literature has been developed that explains thedangers of using the described bibliometrics for evaluation ofJournals, Researchers and Institutions.
Show me the data M. Rossner, H. Van Epps , E. Hill.J. Cell Biology, 179, (6), pp. 1091’1092, (2007).
Citation Statistics. Adler, Erwing, Taylor . Joint Report from(IMU) (ICIAM) (IMS). 6/11/2008.
The case of Computer ScienceComputer science has as much to do with computers asastronomy has to do with telescopes.
Edsger Dijkstra
Computer Science is the only discipline that cannot bedefined in a single sentence.
Christos Papadimitriou
Biological Sc.
CS
Physics
Electrical Eng.
Operation Research
Economics
Mathematics
Social Sciences
The case of Computer Science: Peculiarities
The format and relevance of conferences in CS:Big difference with prestigious conferences in related fieldsMathematics, Physics, etc.The heterogeneity of the field: Informatics is science andengineering.Could be analytical and experimentalSometimes, it is difficult for people working in one area tounderstand work done in other areas of CS. For ex. people workingin Algorithmics and in Algorithmic Engineering.This has created small clusters of work with differentmethodologies and goals.
Conferences in CSI IJCAI (Art. Intell.): 16%I ECAI (Art. Intell.): 26%I ICSE (Soft. Engineer.): 13%I ICDCS (Distributed Comp.): 15%I OOPSLA (Object Technology): 19%I POPL (Prog. Lang.): 19%I AAMAS (Agents): 22%I ICCAD (Graphics): 26%I ICML (Mach. Learning): 27%I ICRA (Robotics): 43%I ICDM (Data Mining): 20%I CVPR (Vision- Recognition): 4%I INFOCOM (Computer Communicat.): 17%I DAC (Design autom.): 24%I IPDPS (Parallel-Dist.): 26%I STOC (Theory): 25%I SODA (Theory): 29%I ICALP (Theory): 33%
Conferences in CS
For CS, conferences should be fully introduced into the DB
Scopus, Google already do, ISI-web-kn is starting to do it.BUT
I Proliferation of conferences exchanging payment andacceptance (ex. WSEAS series)
I Even in the best conferences, plenty of wrong papers:Too many papers to handle by the same PC member in shorttime (use students as referees).For most papers, it is difficult to fit all details in 10 pages.
Conferences in CS
For CS, conferences should be fully introduced into the DBScopus, Google already do, ISI-web-kn is starting to do it.BUT
I Proliferation of conferences exchanging payment andacceptance (ex. WSEAS series)
I Even in the best conferences, plenty of wrong papers:Too many papers to handle by the same PC member in shorttime (use students as referees).For most papers, it is difficult to fit all details in 10 pages.
Conferences in CS
For CS, conferences should be fully introduced into the DBScopus, Google already do, ISI-web-kn is starting to do it.BUT
I Proliferation of conferences exchanging payment andacceptance (ex. WSEAS series)
I Even in the best conferences, plenty of wrong papers:Too many papers to handle by the same PC member in shorttime (use students as referees).For most papers, it is difficult to fit all details in 10 pages.
Example: A. Broder in a STOC-98 paper, used Markov chainMonte-Carlo methods to approximate hard counting problems.
M. Mihail in IPL-99, proved a key step in Broder’s paper was wrongM. Jerrum and A. Sinclair in FOCS-99 fixed the convergence usinga completely different proof.I consider that the paper of A. Broder, even wrong, is one of themost important contributions to TCS in the last 10 yearsPapers in good conferences must be considered mainly by theirnovel ideas. Most of the papers make it into long journal versions.
Example: A. Broder in a STOC-98 paper, used Markov chainMonte-Carlo methods to approximate hard counting problems.M. Mihail in IPL-99, proved a key step in Broder’s paper was wrong
M. Jerrum and A. Sinclair in FOCS-99 fixed the convergence usinga completely different proof.I consider that the paper of A. Broder, even wrong, is one of themost important contributions to TCS in the last 10 yearsPapers in good conferences must be considered mainly by theirnovel ideas. Most of the papers make it into long journal versions.
Example: A. Broder in a STOC-98 paper, used Markov chainMonte-Carlo methods to approximate hard counting problems.M. Mihail in IPL-99, proved a key step in Broder’s paper was wrongM. Jerrum and A. Sinclair in FOCS-99 fixed the convergence usinga completely different proof.
I consider that the paper of A. Broder, even wrong, is one of themost important contributions to TCS in the last 10 yearsPapers in good conferences must be considered mainly by theirnovel ideas. Most of the papers make it into long journal versions.
Example: A. Broder in a STOC-98 paper, used Markov chainMonte-Carlo methods to approximate hard counting problems.M. Mihail in IPL-99, proved a key step in Broder’s paper was wrongM. Jerrum and A. Sinclair in FOCS-99 fixed the convergence usinga completely different proof.I consider that the paper of A. Broder, even wrong, is one of themost important contributions to TCS in the last 10 years
Papers in good conferences must be considered mainly by theirnovel ideas. Most of the papers make it into long journal versions.
Example: A. Broder in a STOC-98 paper, used Markov chainMonte-Carlo methods to approximate hard counting problems.M. Mihail in IPL-99, proved a key step in Broder’s paper was wrongM. Jerrum and A. Sinclair in FOCS-99 fixed the convergence usinga completely different proof.I consider that the paper of A. Broder, even wrong, is one of themost important contributions to TCS in the last 10 yearsPapers in good conferences must be considered mainly by theirnovel ideas. Most of the papers make it into long journal versions.
The IF in ”Theory” CS Journals
The IF of 3 main Theory CS JournalsData from SCOPUS on 3 Theory Journals: JACM, SIAMJC, TCS
TCS
0
1000
2000
3000
4000
5000
6000
2000 2001 2002 2003 2004 2005 2006 2007 2008
SIAM J Comp
JACM
600
02000 2001 2002 2003 2004 2005 2006 2007 2008
SIAM J Comp
JACM
TCS
40
80
120
Number Papers Published
160
300
The IF of 3 main Theory CS JournalsData from SCOPUS on 3 Theory Journals: JACM, SIAMJC, TCS
TCS
0
1000
2000
3000
4000
5000
6000
2000 2001 2002 2003 2004 2005 2006 2007 2008
SIAM J Comp
JACM
600
02000 2001 2002 2003 2004 2005 2006 2007 2008
SIAM J Comp
JACM
TCS
40
80
120
Number Papers Published
160
300
The IF of 3 main Theory CS JournalsData from SCOPUS on 3 Theory Journals: JACM, SIAMJC, TCS
TCS
0
1000
2000
3000
4000
5000
6000
2000 2001 2002 2003 2004 2005 2006 2007 2008
SIAM J Comp
JACM
600
02000 2001 2002 2003 2004 2005 2006 2007 2008
SIAM J Comp
JACM
TCS
40
80
120
Number Papers Published
160
300
The IF of CS Journals
Evaluation of the individual research in CS
The list of the 20 highest H-number for CS scholars (JensPalsberg):(89) Hector Garcia-Molina (Stanford), (87) Jeffrey D. Ullman(Stanford), (82) Robert Tarjan (Princeton), (80) Deborah Estrin(UCLA), (79) C. Papadimitriou (Berkeley), (77) Don Towsley (UMass, Amherst), (73) Ian Foster (Argonne N. L.), (71) ScottShenker (Berkeley), (70) David Culler (Berkeley), (70) DavidHaussler (Santa Cruz) (68) Takeo Kanade (CMU), (61) MarioGerla (UCLA), (61) Nick Jennings (U Southampton), (58) LucaCardelli (Microsoft), (58) Anil K. Jain (Michigan S U), (57) MartinAbadi ( Microsoft), (57) Sushil Jajodia (George Mason U), (57)Leslie Lamport (Microsoft), (57) Demetri Terzopoulos (UCLA),(56) Randy H. Katz (Berkeley),
Turing awards
2007 Clarke-Emerson-Sifakis 1993 Hartmanis-Stearns 1979 K. Iverson
2006 F.Allen 1992 B.W.Lampson 1978 R. Floydv
2005 P. Naur 1991 A.J.Milnerv 1977 J. Backus
2004 Cerf-Kahn 1990 F.Corbato 1976 Rabin-Scott
2003 A.Kay 1989 W.Kahan 1975 Newell-Simon
2002 Adleman-Rivest-Shamir 1988 I.Sutherland 1974 D. Knuth
2001 Dahl-Nygaard 1987 J.Cocke 1973 C.W.Bachman
2000 A.Yao 1986 Hopcroft-Tarjan 1972 E.W.Dijkstra
1999 F.P.Brooks 1985 RM. Karp 1971 J. McCarthy
1998 Jim Gray 1984 N.Wirth 1970 J. Wilkinson
1997 D.Engelbart 1983 Ritchie-Thompson 1969 M. Minsky
1996 A. Pnueli 1982 S.Cook 1968 R. Hamming
1995 M. Blum 1981 E.F.Codd 1967 M.V.Wilkes
1994 Feigenbaum-Reddy 1980 CAR.Hoare 1966 A.J. Perlis
The IF of ”general” CS researchers
Person X H-index:ISI-web-of-knowledge: 48Scopus: 5Scholar: 42
X =
Person X H-index:ISI-web-of-knowledge: 48Scopus: 5Scholar: 42
X =
Personal Conclusions 1:
On Journals and Conferences
I The IF is useful as a first approximation.
I DB should be pruned of self-references.
I Editorial Boards should be professionalizedEd. Boards members should receive a fee for the jobReferees should receive an honorarium for a good jobStrict honorability rules should be implemented for behavior inEditorial Boards and Program Committees.
I There should be an IF for good CS conferences
I The expertise of the people working in a specific field, is themost important measure of evaluation for a given Journal andConference (assuming the honesty of the researcher!)
Personal Conclusions 1:
On Journals and Conferences
I The IF is useful as a first approximation.
I DB should be pruned of self-references.
I Editorial Boards should be professionalizedEd. Boards members should receive a fee for the jobReferees should receive an honorarium for a good jobStrict honorability rules should be implemented for behavior inEditorial Boards and Program Committees.
I There should be an IF for good CS conferences
I The expertise of the people working in a specific field, is themost important measure of evaluation for a given Journal andConference (assuming the honesty of the researcher!)
Personal Conclusions 2:On Research Evaluation
I Bibliometrics as the H-number are good as a lower bound onthe capacity of researchers, but it is pretty much helplessevaluating the research abilities above a certain threshold.
I Parameters as the H-number could be improved by returningto computing it in an ad-hoc manner (difficult)
I In Bibliometrics should incorporate a ”negative” score forfacts as excessive self-referencing and plagiarism (two of themodern plagues affecting CS research). This is best done bycareful interpretation of the existing DATA.
I In CS recognition of seminal ideas could take a long time.I The fashion for publishing as fast as possible and as much as
possible should be reversed in CS. In that sense, theevaluation must focus in: novel ideas, steps forward, noveltechniques, etc.A ”wrong paper” with interesting ideas that stimulate plentyof researchers is more important for the progress of CS than apaper who nobody cares even to validate its results.
I The design of good useful software should be incorporatedinto the evaluation of researchers in CS.
Personal Conclusions 3:
On the Researchers
I The fashion of publishing as fast as possible and as much aspossible should be reversed in CS. In the long run, a few goodideas will be better for your reputation than 100 irrelevantpublications.
I As John McCarthy says: Excessive self reference is often asdangerous as smoking.
I Plagiarism and self-plagiarism is the most dangerous game forthe self reputation.
I Concurrent multitasking of Editorial Boards and ProgramCommittees is dangerous for the fairness of the process.
Thank you.