the structure of computer science knowledge network

13
Lehrstuhl Informatik 5 (Informationssy steme) Prof. Dr. I5-PK-0810-1 Manh Cuong Pham Ralf Klamma TeLLNet The Structure of the Computer Science Knowledge Network Manh Cuong Pham, Ralf Klamma Information Systems and Database Technology RWTH Aachen, Germany Odense, Denmark, August 09, 2010 ASONAM 2010

Upload: pham-cuong

Post on 08-May-2015

933 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: The Structure of Computer Science Knowledge Network

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. Matthias JarkeI5-PK-0810-1

Manh Cuong PhamRalf Klamma

TeLLNet

The Structure of the Computer Science Knowledge Network

Manh Cuong Pham, Ralf KlammaInformation Systems and Database Technology

RWTH Aachen, Germany

Odense, Denmark, August 09, 2010

ASONAM 2010

Page 2: The Structure of Computer Science Knowledge Network

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. Matthias JarkeI5-PK-0810-2

Manh Cuong PhamRalf Klamma

TeLLNet

Agenda

Introduction SNA as a knowledge discovery method Data sets: DBLP and CiteSeerX Network visualization Venue ranking Conclusions and Outlook

Page 3: The Structure of Computer Science Knowledge Network

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. Matthias JarkeI5-PK-0810-3

Manh Cuong PhamRalf Klamma

TeLLNet

Introduction

Digital libraries (in computer science)- DBLP, ACM DL, IEEE Explorer, CiteSeerX, etc.- Digital media for scientific knowledge conservation

- Publications- Venues

- Development of research communities & research areas

- Knowledge discovery: Citation analysis, usage-analysis, etc.

- Digital libraries in Web 2.0: Mendeley, ResearchGate etc.

Problems- Structure of computer science knowledge- Existing research fields - The interconnection between fields

VLDB community in 2006 (DBLP)

VLDB community in 1990 (DBLP)

Page 4: The Structure of Computer Science Knowledge Network

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. Matthias JarkeI5-PK-0810-4

Manh Cuong PhamRalf Klamma

TeLLNet

Motivations Scientometrics

- Unit of analysis: journals- Knowledge mapping: building, visualizing and analyzing the knowledge network- Methods:

- Citation analysis [Boyack 2005]

- Content analysis- Log-data (usage data) analysis [Bollen 2009]

- Data sets: - Journal Citation Index (JCR)- Science Citation Index (SCI)- Social Science Citation Index (SSCI), etc.

Problem– Computer science conferences

Page 5: The Structure of Computer Science Knowledge Network

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. Matthias JarkeI5-PK-0810-5

Manh Cuong PhamRalf Klamma

TeLLNet

Our Approach

Combination of large-scale digital libraries- DBLP- CiteSeer X

Citation analysis- Bibliographical coupling at venue level (conferences, journals) - Similarity measures

SNA as a knowledge discovery method- Visual analytics- Cluster analysis- SNA measures: PageRank, betweenness, hub, authority scores etc.

Page 6: The Structure of Computer Science Knowledge Network

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. Matthias JarkeI5-PK-0810-6

Manh Cuong PhamRalf Klamma

TeLLNet

Data Sets DBLP (http://www.informatik.uni-trier.de/~ley/db/)

- 788,259 author’s names- 1,226,412 publications- 3,490 venues (conferences, workshops, journals)

CiteSeerX (http://citeseerx.ist.psu.edu/)- 7,385,652 publications (including publications in reference lists)- 22,735,240 citations- Over 4 million author’s names

Combination- Canopy clustering [McCallum 2000]- Result: 864,097 matched pairs - On average: venues cite 2306 and

are cited 2037 times

Page 7: The Structure of Computer Science Knowledge Network

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. Matthias JarkeI5-PK-0810-7

Manh Cuong PhamRalf Klamma

TeLLNet

Network Creation and Pre-processing

Knowledge network- Aggregate bibliography coupling counts at venue level- Undirected graph G(V, E), where V: venues, E: edges weighted by cosine

similarity

- Threshold: - Clustering: density-based algorithm [Neuman 2004, Clauset 2004]- Network visualization: force-directed paradigm [Fruchterman 1991]

Knowledge flow network- Aggregate bibliography coupling counts at venue level- Threshold: citation counts >= 50 Domains from Microsoft Academic Search

(http://academic.research.microsoft.com/)

n

k kj

n

k ki

n

k kjki

ji

jiji

BB

BB

BB

BBC

1

2,1

2,

1 ,,

22

,

1.0, jiC

Page 8: The Structure of Computer Science Knowledge Network

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. Matthias JarkeI5-PK-0810-8

Manh Cuong PhamRalf Klamma

TeLLNet

Knowledge Network:the Visualization

Page 9: The Structure of Computer Science Knowledge Network

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. Matthias JarkeI5-PK-0810-9

Manh Cuong PhamRalf Klamma

TeLLNet

Knowledge Network:Clustering

Page 10: The Structure of Computer Science Knowledge Network

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. Matthias JarkeI5-PK-0810-10

Manh Cuong PhamRalf Klamma

TeLLNet

Interdisciplinary Venues:Top Betweenness Centrality

Page 11: The Structure of Computer Science Knowledge Network

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. Matthias JarkeI5-PK-0810-11

Manh Cuong PhamRalf Klamma

TeLLNet

High Prestige Series:Top PageRank

Page 12: The Structure of Computer Science Knowledge Network

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. Matthias JarkeI5-PK-0810-12

Manh Cuong PhamRalf Klamma

TeLLNet

Conclusions and Future Research SNA does help to gain an insight into the computer science knowledge Knowledge network in computer science

- Highly clustered, large clusters form the core of computer science research- Research fields are interconnected- Interdisciplinary venues

Outlook- More digital libraries should be integrated: ACM, IEEE, CEUR-WS.org, etc.- Usage analysis- Dynamic analysis of knowledge network

Page 13: The Structure of Computer Science Knowledge Network

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. Matthias JarkeI5-PK-0810-13

Manh Cuong PhamRalf Klamma

TeLLNet

Questions ?

http://bosch.informatik.rwth-aachen.de:5080/AERCS/