a social network analysis of research collaboration in the economics community thomas krichel (long...

36
A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

Upload: aaliyah-flores

Post on 27-Mar-2015

218 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

A social network analysis of research collaboration in the

economics community

Thomas Krichel (Long Island University)

Nisa Bakkalbasi (Yale University)

Page 2: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

sponsors

● Open Society Institute through the sponsoring of the ACIS project– http://acis.openlib.org– RAS is now based on ACIS.

● Miteq Corp for the computation support– They sponsored usage of HP Proliant 8 CPU

machine on which the computations are done. – Otherwise they would have taken a verrrrry long

time.

Page 3: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

structure of this talk

● background on RePEc● RePEc author service● centrality as an incentive device● back to basics● results using the RePEc author service● implementation challenges

Page 4: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

RePEc essence and history

● It is an open-access abstracting and indexing database about economics.

● It goes back to 1993 when Thomas Krichel started to build indeces of printed and online working papers in economics.

● Now it also covers journal articles and some other publication types such as books and book chapters.

Page 5: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

what is interesting about RePEc

● Large ● Unfunded● Relational● Evaluation oriented

Page 6: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

RePEc is large

● Over 550 archives contribute document data to the collection.

● There about 350k items described. These are more than in arXiv.org, at some recent count.

● There are about 10 different user services that use RePEc data or further process.

Page 7: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

RePEc is unfunded

● While there are some sponsors for parts of RePEc, neither data collection or service provision is externally sponored.

● Most data about publications come from dedicated RePEc archives based at – economics departments at universities– other research centers– some specialized administrative units such as

central banks.● Services are mainly run by amateurs.

Page 8: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

RePEc is relational

● RePEc does not only register documents but also researcher and their institutions.

● Institutions are centrally registered by one volunteer, Christian Zimmermann.

● People register with the RePEc Author Service RAS. More about this later.

Page 9: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

RePEc is evaluation-oriented

● Since we have indentified authors, we can aggregate evaluative measures over authors and institutions.

● Recently, Christian Zimmermann has built a battery of 22 different indicators for individuals.

● This is very rich dataset for scientometric exercise.

any questions?

Page 10: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

RAS history and essence

● It goes back to 1999 when Thomas Krichel directed work by Markus Johannes Richard Klink to build a special author registration web interface.

● In 2002 the Open Society Institute contributed $50k to develop a generic software to implements servics such as RAS. – The software is written by Ivan V. Kurmanov.– It is called ACIS (Academic Contribution Information

System)

Page 11: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

how does RAS work?

● Authors contact RAS to let RePEc know what papers they have written.– Registrants create and maintain a personnal profile– Registrants create and maintain a name variations

profile– RAS creates and maintains a contributions profile.

● Once an initial profile is defined ACIS has a mechanism called ARPU that alerts authors about documents being added to their profile.

● The contributions profile contains the name of all documents.

Page 12: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

what is interesting about RAS?

● Registration of authors solves all problems of trying to indentify authors by their names.– There are many ways to represent the same name.

ex Bruno Van Pottelsbergh De la Potterie, proceedings page 128. Some RAS registrants have even longer names!

– Many different authors may share the same name or the same way in which the name can be represented.

● Solving these problems "manually" is very expensive and only feasible for small sets of authors.

Page 13: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

but RAS is not complete

● Bakkalbasi and Krichel (2006) http://openlib.org /home/krichel/papers/elba.pdf, (Elba paper) have shown, that, at their time of writing– Roughly every third RePEc document has at least

one registered author.– Roughly very fourth RePEc authorship is captured

by RAS.● These figures are not likely to change very

rapidly. – RAS gets more registrants.– RePEc gets more documents.

Page 14: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

RAS and co-authorship

● In the Elba paper there is a conjecture that the fact that author A is registered does not significantly increase the chance that the co-authors of A are registered.

● This is can not be formally shown without labouring through attempt to identify by name.

● One indication is that the graph of formed by co-author relationships in RAS is not dense. This has been found in recent work by Nisa Bakkalbasi.

Page 15: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

registration incentive on co-authors

● To get authors to register, we need good incentives.

● In conventional (Zimmermann's 22) indicators, the positionn of an author depends only on the author's action.

● If we use co-authorship, we can devise rankings that depend on co-authorship.

● If we have such a ranking, authors will have incentives to get their co-authors to register.

Page 16: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

imagine a RAS-CIS

● A RAS Collaboration Information System should be built.

● RAS-CIS could show the registrants– local information about shortest paths– network summaries via centrality indices

● The summary information will improve with more colllaborators of the author registered.

Page 17: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

two tasks to build RAS-CIS

● We have to select the measures to calculate and develop the tools to calculated them. This is what the paper is about.

● We have to build an interface that will allow intuitive access to that data. The data would have to be updated.

● Since there has been no similar service before this is a hard task. But not done here.

Page 18: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

the job here

● We calculate differents centrality rankings of authors.

● We compare the rankings among themselves. ● We want to select a measure that is best to use

in web-based collaboration centrality ranking service.

● RAS-CIS is still to be built.

Page 19: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

collaboration graph

● From a social networking perspective, collaboration establishes a graph structure– RAS authors are the nodes.– Collaboration, i.e. common claim(s) of a same

paper is the arcs between nodes. – If there is no common paper claimed by two authors

no arcs exists between the nodes.● Specific results depends on how the arc length

is calculated from the collaboration structure.

Page 20: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

graph components

● If there is a path between one author A and another author B along collaboration archs, we say that A and B belong to the same component of the collaboration graph.

● It is commonly observed in real work network that the largest component is quite large. It usually has more than 50% of all nodes and it is therefore know as the giant component.

● Most centrality measures are only meaningful for the members of the giant components.

Page 21: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

face the force of facts

● 13049 registrants are found it RAS.● 9111 registrants (70% of registrants) are

authors, i.e. they have claimed at least one paper.

● 6038 registrants (66% of authors) are co-authors, i.e. they are authors who have collaborated with at least one other RAS author.

● 5019 registrants (83% of co-authors) are in the giant component.

Page 22: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

the RAS nodes

● 5019 authors is still a rather large network. Compare to the 96 authors in the Hou & Kretschmer and Liu paper on page 77 in the proceedings.

● There are at least 12592671 shortest paths between the authors, and many more other paths.

● Calculations of a set of shortest paths takes 10 hours on an 8 CPU machine.

Page 23: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

network type

● Between any two nodes, there is an edge if the authors have ever collaborated.

● But the length of the edge depends on your point of view of the strength of collaboration.

● Different edge lengths lead to different networks.

● We introduce three networks in the following three slides.

Page 24: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

network 1: binary network

● In the binary network, the collaboration strengh between any two authors is one if the two authors have claimed at least one common paper in RAS. The collaboration strength is zero otherwise.

● The edge length is the inverse of the collaboration strength.

● If the collaboration strength is zero, there is no edge between the two nodes.

● We use an algorithm by Newman to do the calculations.

Page 25: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

network 2: symmetric weighted network

● In a symmetric weighted network, for each paper that two authors have claimed in common, we increment the collaboration strength between two authors by the number of authors on that paper minus 1.

● As a result, the total collaboration strength of an author is the amount of co-authored papers.

● We used the Dijkstra algorithm to find the shortest paths. This will find only one shortest path.

Page 26: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

network 3: random walk network

● In this type of network, we norm the collaboration strength of each author to be one.

● This generates an assymetric networks where inward edges are shorter for important authors who have written more papers.

● This type of measures is used in SNA to measure prestige.

● We used the Dijkstra algorithm to find the shortest paths. This will find only one shortest path.

Page 27: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

centrality measures

● For each network, we can look at two centrality measures. – closeness centrality: a node is more central if it has

shorter average shortest path leading to all other nodes.

– betweeness centrality: a node is more central if it lies on the more shortest paths leading from one node to the other.

● These centrality measures rank authors from the more central to the least central.

Page 28: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

notation for centrality measures

● BIC closeness centrality in the binary network● BIB betweeness centrality in the binary network● SYB closeness centrality in the symmetric weighed

network● SYC betweeness centrality in the symmetric

weighed network● RWC closeness centrality in the colnetwork● RWB betweeness centrality in the binary network

Page 29: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

pair-wise Spearman rank correlation

BIC BIB SYC SYB RWC RWB

BIC 1 .60 .90 .52 .89 .30

BIB .60 1 .54 .81 .61 .57

SYC .90 .54 1 .54 .91 .23

SYB .52 .81 .54 1 .56 .42

RWC .87 .61 .91 .56 1 .41

RWB .30 .57 .23 .42 .41 1

Page 30: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

comments

● All three closeness measures are produce very similar rankings.

● SYB and BIB are close, but RWB is quite far off both of them.

● Overall, the choice of betweeness and closeness seem to be more important that the choice between models. This has been a surprise to us. BIC and BIB are close by 60%, the others are even lower.

Page 31: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

adding the number of documents

● We can add the number of documents as an additional ranking criterion NDO. We get

NDO BIC BIB SYC SYB RWC RWB

NDO 1 .68 .55 .71 .60 .70 .19

● Overall, the weighed network appears to be best correlated with the number of documents. This should come as no surprise.

Page 32: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

why add this alien number NDO?

● We can think of NDO as the simplest easiest indication of the personal fame of an author.

● If we want to incentivize authors to want to climb the ranks of a collaboration centrality ranking, we need to have people at the top that they do actually realize.

● Remember Groucho Marx "I'll never join a club that accepts me as a member".

● Thus the symmetric weighed network appears appealling.

Page 33: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

symmetric weight network

● If we are using the symmetric value is an interface, the numbers that come out for closeness are not intutive because the total length are fractions.

● But the fact that there should be much less path multiplicity makes the presentation simpler.

● But the paths may be longer (in simple counts of intermediate nodes) than counts in the binary model.

Page 34: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

RAS-CIS● The most difficult aspect is to build the interface

when there is no similar service present at this time.

● The updating can not be done instantaneous, but ought to be close to it. – If the contributions profile of an author changes, we

can recalculate her paths. – We can also recalculate the paths of her co-

authors.– But then we end up with an overall network that is

no longer symmetric.

Page 35: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

more work

● RAS authorship are a high-quality dataset that is easy to use.

● It is not widely used at this point.● Note in particular that much of the data

affecting collaboration has not been worked on– affiliation data– journal/series data– subject classification data

● New ideas and partnerships welcome!

Page 36: A social network analysis of research collaboration in the economics community Thomas Krichel (Long Island University) Nisa Bakkalbasi (Yale University)

Thank you for your attention!

http://openlib.org/home/krichel

write to [email protected]