a collaboration graph for e-lis thomas krichel long island university & novosibirsk state...

35
A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

Upload: georgiana-ferguson

Post on 27-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

A collaboration graph for E-LIS

Thomas KrichelLong Island University & Novosibirsk State

University & Open Library Society3 November 2011

Page 2: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

Introduction

• Thanks– Ángel Sánchez Villegas for usage of the e-lis

domain. – To Tomas Baiget, who has encouraged me to

present here. • Warnings– Data shown here were correct as of 1 November

2011.– I am glossing over some technical details.– Over 30 slides

Page 3: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

overview

• Introduction to AuthorClaim• Introduction to a co-authorship network

based on restricting AuthorClaim to E-LIS documents

• Web interface and campaign

Page 4: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

a known problem

• In publishing systems such as E-LIS, the authors are usually entered by name.

• It is well known that the name of an author does not identify a author– multiple ways to express the name of the same

person – multiple people sharing one expression of their

names

Page 5: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

a tried solution

• One way to partially solve this problem is to have a system where authors can – claim papers that they have written– disclaim papers written by their homonyms

• The first system of this kind was the RePEc Author Service– created by Thomas Krichel in 1999– now has registered over 30000 economists

Page 6: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

AuthorClaim

• AuthorClaim is an interdisciplinary version of the RePEc Author Service.

• It was created by Thomas Krichel in 2008.• Lives at http://authorclaim.org.• Over 100000000 authorships of over

35000000 documents can be claimed. • Among the documents are the E-LIS papers.

Page 7: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

445 E-LIS papers claimed …• 72 Tomas Baiget• 61 Ulrich Herb• 43 Antonella De Robbio• 39 Thomas Krichel• 26 Andrea Marchitelli & fernanda peset,• 20 Ross MacIntyre• 16 Dirk Lewandowski• 15 Bożena Bednarek-Michalska• 14 Lidia Derfert-Wolf• 11 Zeno Tajoli & Imma Subirats

Page 8: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

by 36 authors • 9 Derek Law & Emma McCulloch & Philipp Mayr• 8 Jeffrey Beall• 7 nuria Lloret Lloret Romero• 6 Benjamin John Keele• 5 Adrian Pohl & Maria Francisca Abad-Garcia• 4 Walther Umstaetter• 3 Andrea Scharnhorst & Jose Manuel Barrueco &

Thomas Hapke & Christian Hauschke & Klaus Graf• 2 Frank Havemann & Eberhard R. Hilf & Bhojaraju

Gunjal & Chris L. Awre• 1 Loet Leydesdorff & Peter Bolles Hirtle & Alexei

Botchkarev & Christina K. Pikas & Oliver Flimm & Sridhar Gutam

Page 9: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

so far so good

• I don’t really want to talk about AuthorClaim but about a services that we can build when we have identified authors.

• When we have this data, we can find out who has been writing papers with whom.

• In other words we can study the co-authorship network.

Page 10: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

co-authorship

• When two registered author claim to have authored the same paper, we say that they are co-authors.

• The authorship relationship creates a link between the two authors.

• The link is symmetric, meaning that the fact that Thomas is a co-author of Imma means that Imma is a co-author of Thomas.

Page 11: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

58 papers have been co-claimed …

• 12 fernanda peset• 10 Tomas Baiget• 8 Imma Subirats• 6 Antonella De Robbio• 4 nuria Lloret Lloret Romero

Page 12: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

by 16 co-authors

• 2 Andrea Marchitelli & Ulrich Herb & Ross MacIntyre & Bożena Bednarek-Michalska & Thomas Krichel & Dirk Lewandowski & Lidia Derfert-Wolf

• 1 Derek Law & Emma McCulloch & Sridhar Gutam & Philipp Mayr

Page 13: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

network and components

• When we start with one co-author, and we move to her co-authors, what other authors can be reach?

• We call the authors we can reach by starting from any one of them by following co-authorship relationships a component of the network.

Page 14: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

components in the network

• “Scottish”: Derek Law & Emma McCulloch• “Polish”: Bożena Bednarek-Michalska & Lidia

Derfert-Wolf • “German”: Dirk Lewandowski & Sridhar Gutam

& Philipp Mayr • “Giant”: Andrea Marchitelli & Ulrich Herb &

Thomas Krichel & Antonella De Robbio & fernanda peset & Imma Subirats & Ross MacIntyre & nuria Lloret Lloret Romero & Tomas Baiget

Page 15: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

the giant component

• The size of the giant component is larger than the combined size of all other component.

• It is very common, in real existing networks, that there is a giant component.

• As the network grows, older small components join the giant component and new small components are created.

• We therefore study the giant component.

Page 16: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

centrality

• Who is at the center of the E-LIS author network, i.e. the most central author in E-LIS?

• The answer is that it depends on how we measure centrality.

• Two measures are commonly used– closeness centrality– betweenness centrality

• Both depend on a measure of distance

Page 17: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

distance

• To understand that we need a measure of distance. – We say that two authors have distance one if they

are co-authors.– We say that two authors have distance two if they

are not co-authors, but have a common co-author.– etc

Page 18: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

distances for Imma Subirats

• Tomas Baiget 1• Antonella De Robbio 1• Ulrich Herb 2• Thomas Krichel 1• nuria Lloret Lloret Romero 2• Andrea Marchitelli 2• Ross MacIntyre 2• fernanda peset 1• Imma Subirats 0

Page 19: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

distances for Ulrich Herb

• Tomas Baiget 1• Antonella De Robbio 3• Ulrich Herb 0• Thomas Krichel 2• nuria Lloret Lloret Romero 3• Andrea Marchitelli 4• Ross MacIntyre 4• fernanda peset 2• Imma Subirats 2

Page 20: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

closeness centrality

• The average distance of Imma is much small than the average distance of Ulrich.

• In fact, we can calculated to average distance of the every author from all other authors.

• This is what we call closeness centrality of an author.

Page 21: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

shortest paths

• In order to find the distance between two authors, we have to evaluate all possible paths between them.

• We need to find shortest paths between. There are well-known algorithms to find them.

• The distance is the length of the shortest path.

Page 22: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

diameter

• When we have found all shortest paths, we can find the length of the longest shortest paths between any two authors.

• This is called the diameter.• In our network the diameter is four. • This much smaller than the number of authors

in the giant component (16).• We say that our network has the “small world”

property.

Page 23: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

shortest paths from Tomas Baiget

• → Thomas Krichel• → fernanda peset → nuria Lloret Lloret Romero • → fernanda peset • → Imma Subirats → Antonella De Robbio → Ross

MacIntyre • → Ulrich Herb• → Imma Subirats → Antonella De Robbio• → Imma Subirats → Antonella De Robbio → Andrea

Marchitelli • → Imma Subirats

Page 24: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

shortest paths from Antonella De Robbio

• → Imma Subirats → fernanda peset → nuria Lloret Lloret Romero

• → Imma Subirats• → Imma Subirats → Tomas Baiget → Ulrich Herb• → Imma Subirats → Tomas Baiget• → Imma Subirats → fernanda peset• → Andrea Marchitelli • → Ross MacIntyre • → Thomas Krichel

Page 25: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

shortest paths from Ross MacIntyre • → Antonella De Robbio → Imma Subirats → fernanda

peset → nuria Lloret Lloret Romero• → Antonella De Robbio → Imma Subirats → fernanda

peset• → Antonella De Robbio → Imma Subirats → Tomas

Baiget → Ulrich Herb• → Antonella De Robbio → Thomas Krichel• → Antonella De Robbio → Imma Subirats → Tomas

Baiget• → Antonella De Robbio → Imma Subirats• → Antonella De Robbio• → Andrea Marchitelli

Page 26: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

what do the paths tell us?• We find that some authors are appearing more

often as intermediaries than other authors.• In fact, we can evaluate the number of times

an author appears as an intermediary in the paths.

• This is what we call the betweenness centrality of an author.

• A large number of authors have a betweenness of zero. They are called marginal authors.

Page 27: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

summary

• We build a network.• We find two ways to evaluate authors– closeness– betweenness

• Now let us look at the results.

Page 28: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

ranking for closeness rank name closeness • 1 Imma Subirats 1.5 • 2 Antonella De Robbio 1.75 • 2 Tomas Baiget 1.75 • 2 Thomas Krichel 1.75 • 5 fernanda peset 1.875 • 6 Andrea Marchitelli 2.5 • 6 Ross MacIntyre 2.5 • 8 Ulrich Herb 2.625 • 9 nuria Lloret Lloret Romero 2.75

Page 29: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

ranking for betweenness rank name betweenness • 1 Antonella De Robbio 2.7 • 1 Imma Subirats 2.7 • 3 Tomas Baiget 2.025 • 4 fernanda peset 1.575 • Andrea Marchitelli, Ross MacIntyre, nuria

Lloret Lloret Romero, Thomas Krichel, Ulrich Herb are all marginal.

Page 30: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

web service

• E-LIS and AuthorClaim data are readily available in bulk.

• There is a software called icanis, developed by yours truly, that can calculate and visualize results. It is configurable via XSLT.

• Almost instantaneous updates are in principle possible, but not implemented.

Page 31: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

coll.e-lis.org

• This is a site that I have set up.• I think we need a site in the rclis domain but I

am not sure what the name should be.• coll.e-lis.org is a bad name too. • So this is meant as a prototype.

Page 32: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

features

• Rankings for closeness.• Full path searching from author pages– with support for partial name entry– but within there no highlighting for parts

• Unclear documentation

Page 33: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

ranking

• Ranking is the way forward with populating scholarly communication services. RePEc has shown this time and again.

• Co-authorship ranking is particularly interesting because authors have to convince their co-authors to publish papers in E-LIS and to claim them in AuthorClaim.

Page 34: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

campaign• We need to do some work on the site.• Then we can have campaign and award a cash

prize.• I am thinking about donating $200 to the top

of each category or $300 to joint winner. • The competition would be time-limited, say

about three months next Summer. • During that time we would do frequent

updates of the site.

Page 35: A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

Thank you for your attention!

http://openlib.org/home/krichel

write to [email protected]