ekaw2014 - a hybrid semantic approach to building dynamic maps of research communities

26
Francesco Osborne, Beppe Scavo, Enrico Motta KMi, The Open University, United Kingdom November 27 th 2014 A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities

Upload: francesco-osborne

Post on 12-Jul-2015

268 views

Category:

Science


3 download

TRANSCRIPT

Page 1: EKAW2014 - A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities

Francesco Osborne, Beppe Scavo, Enrico Motta

KMi, The Open University, United Kingdom

November 27th 2014

A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities

Page 2: EKAW2014 - A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities

Research communitiesThe engine of research.

Page 3: EKAW2014 - A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities

We need to understand how scientific communities adapt and cooperate to implement visions into concrete technologies.

Page 4: EKAW2014 - A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities

Communities of academic authors are usually identified by using standard community detection algorithms, which typically exploit co-authorship or citation graphs.

Research communities

Page 5: EKAW2014 - A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities

A different type of community we investigated is formed by the set of researchers who, at a given time, are following shared research trajectory, i.e. they are working on the same topics at the same time.

Temporal topic-based communities (TTC)

Osborne, F., Scavo, G., & Motta, E. (2014). Identifying diachronic topic-based research communities by clustering shared research trajectories. In The Semantic Web: Trends and Challenges (pp. 114-129). Springer International Publishing.

Page 6: EKAW2014 - A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities

Research Communities Map Builder

• RCMB is able to automatically link diachronic topic-based communities over subsequent time intervals to identify significant events.

• These include topic shifts within a community; the appearance and fading of a community; communities splitting, merging, spawning other communities; etc.

• The output of RCMB is a map of research communities, annotated with the detected events, which provides a concise visual representation of the dynamics of a research area.

Page 7: EKAW2014 - A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities

RCMB steps:

1. Applies the Temporal Semantic Topic-Based Clustering (TST) algorithm to find Temporal topic-based communities in different time intervals;

2. Detects Topic Shifts;

3. Links Communities in different years;

4. Detect Key Events;

Page 8: EKAW2014 - A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities

RCMB steps:

1. Applies the Temporal Semantic Topic-Based Clustering (TST) algorithm to find Temporal topic-based communities in different time intervals.

2. Detects Topic Shifts in following years

3. Links Communities in different years

4. Detect Key Events

Temporal Semantic Topic-Based Clustering

Osborne, F., Scavo, G., & Motta, E. (2014). Identifying diachronic topic-based research communities by clustering shared research trajectories. In The Semantic Web: Trends and Challenges (pp. 114-129). Springer International Publishing.

Page 9: EKAW2014 - A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities

TST in short

1. It augments the topic semantically using an automatically generated OWL ontology and represent each author as a semantic topic distribution over subsequent years.

2. It weighs each topic according to its relationship with the main topic, for highlighting the communities strongly related to the main topic.

3. It clusters authors using the ATTS (Adjusted Temporal Topic Similarity), which is computed by averaging the cosine similarities of the topic vectors over progressively smaller intervals of time.

Page 10: EKAW2014 - A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities

Detecting Topic Shifts

We use a sliding window algorithm that checks for a topic shift by comparing the initial topic distribution in time t with the topic distributions in time t+1, t+2… t+n.

2002Infor. Extraction: 26 %Natural Language: 17 %Named Entity: 12 %Machine Learning: 9 %Knowledge Base: 9 %

2010Linked Data: 16 %Natural Language: 15 %Semantic Annotation: 15 %SW Technology: 10 %Information Retrieval: 10 %Knowledge Base: 9 %Semantic Wiki: 9 %

2006Semantic Annotation: 25 %Knowledge Base: 15 %Semantic Wiki: 11 %Information Extraction: 10 %Semantic Information: 8 %Natural Language: 6 %Information Retrieval: 6 %

Information Extraction/Semantic Annotation community

Page 11: EKAW2014 - A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities

We define a topic shift a statistically significant change (detected via chi-square test ) in the topic distribution of a community which occurred in a certain time interval.

To detect which topics were the main protagonists of this shift, we applying the same test excluding each time a different topic, and selecting the topic whose absence yields the bigger increment in the p value.

Detecting Topic Shifts

Page 12: EKAW2014 - A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities

Community linking

We are interested in two different links between community:

• The strong link is defined as a link that connects the same community in subsequent timeframes.

• The weak link is defined as the link that connects community C

1 with community C

2 in a subsequent timeframe, if C

1 has an

impact over C2 in terms of migrating authors and/or topics.

Page 13: EKAW2014 - A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities

Community linking

Page 14: EKAW2014 - A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities

Community linking

We take the minimum values of ts

and tw

that minimize the MEF using the

Nelder-Mead algorithm.

Page 15: EKAW2014 - A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities

Key Events detection

If a community has no strong links with any precedent interval communities, we detect the appearance of a community.

2006 2007

Natural LanguageDialogue Systems

Speech RecognitionPattern Recognition

Speech Recognition, Natural Language

Human Robot Interaction

Pattern Recognition

C1

C3

C2

C1

C2

Page 16: EKAW2014 - A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities

Key Events detection

If a community has no strong links with any subsequent interval communities, we detect the fading of a community.

2006 2007

Natural LanguageDialogue Systems

Speech RecognitionPattern Recognition

Speech Recognition, Natural Language

Human Robot Interaction

Pattern Recognition

C1

C3

C2

C1

C2

Page 17: EKAW2014 - A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities

Key Events detection

If a community is linked to more than one community in the subsequent interval and one of the links is a strong one we detect the forking of one or more communities out of the community characterized by the strong link.

2006 2007

Natural LanguageDialogue Systems

Speech RecognitionPattern Recognition

Speech Recognition, Natural Language

Human Robot Interaction

Pattern Recognition

C1 C1

C2

Page 18: EKAW2014 - A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities

Key Events detection

If a community is linked to more than one community in the subsequent interval and none of the links is a strong one we detect the splitting of a community into multiple communities.

2006 2007

Natural LanguageDialogue Systems

Speech RecognitionPattern Recognition

Speech Recognition, Natural Language

Human Robot Interaction

Pattern Recognition

C1

C3

C2

Page 19: EKAW2014 - A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities

Key Events detection

If two or more communities are linked to one community in the subsequent interval and one of the inlinks is a strong link, we detect the assimilation of one or more communities into the community C characterized by the strong link.

2006 2007

Natural LanguageDialogue Systems

Speech RecognitionPattern Recognition

Speech Recognition, Natural Language

Human Robot Interaction

Pattern Recognition

C1 C1

C2

If the communities fade after the event, they are labelled as absorbed to C.

Page 20: EKAW2014 - A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities

Key Events detection

If two or more communities are linked to one community in the subsequent interval and none of the inlinks is a strong link, we detect the merging of two or more communities in a new community C.

2006 2007

Natural LanguageDialogue Systems

Speech RecognitionPattern Recognition

Speech Recognition, Natural Language

Human Robot Interaction

Pattern Recognition

C1

C3

C2

If the communities fade after the event, they are labelled as merged in C.

Page 21: EKAW2014 - A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities

Evaluation: Cluster Compactness

Page 22: EKAW2014 - A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities

Case study

We applying RCMB to two research areas: World Wide Web (WWW) and Semantic Web (SW).

Our study was based on a dataset built from data retrieved by means of the API provided by Microsoft Academic Search.

We first retrieved authors and papers labelled with WWW and SW or with their first 150 co-occurring topics. We then run RCMB on WWW and SW in the 2000-2010 time interval with a granularity of 3. The average number of authors selected in each year was 932 for WWW and 646 for SW.

Page 23: EKAW2014 - A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities

Semantic Web

Page 24: EKAW2014 - A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities

WWW

Page 25: EKAW2014 - A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities

Future Work

• Automatically generate comprehensive explanations for

the identified dynamics.

• Forecasting topic shifts and key events, e.g., estimating

the probability that a new topic will emerge in a certain

community or that two communities will merge in the

coming years.

Page 26: EKAW2014 - A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities

Questions?

Interested in scholarly data?

SAVE-SD 2015Semantics, Analytics, Visualisation: Enhancing Scholarly Data Workshop at 24th International World Wide Web Conference

May 19, 2015 - Florence, Italy

Site: cs.unibo.it/save-sd