early detection and forecasting of research trends

Early Detection and Forecasting of Research Trends

Angelo Antonio Salatino @angelosalatino

Advisors:Prof. Enrico Motta

Dr. Francesco Osborne

ISWC 2015 – Doctoral Consortium

Problem

Who cares?• Researchers: following the evolution of the

research environment• Academic publishers: promoting up-to-date

and interesting contents• Companies: early intelligence on potentially

important research trends to remain at the forefront of innovation

• Funding bodies: improved understanding of the research landscape

State of the art: Trend detection

• Topic evolution using bibliometric analysis:– Content analysis

• Topics extraction• Main terms in documents

– Citation analysis– Main limitation: cannot detect new trends

early enough in the lifecycle

[Wu et al. 2011, Bolelli et al. 2009, He et al. 2009]

State of the art: Forecasting impact

• Impact based on number of publications and authors associated with topics

• Approaches based on exponential smoothing, simple medium average and machine learning

• Limitations: – These approaches don’t work at embryonic and

early stages– They only use a limited set of data sources

[Budi al. 2012, Jun et al. 2010, Tseng et al. 2009]

Planned approachWider range of data sources: comprehensive knowledge base integrating both scholarly data and social media

Planned approach

– For example, before the Semantic Web emerged explicitly as research area we could identify new interesting dynamics involving authors from different research areas such as knowledge representation, agent systems, hypertext and databases.

– Creation of a model that takes into account all the discovered patterns which may involve different entities (e.g., authors, venues, topics, communities)

Focus on discovering patterns emerging from the research dynamics:

Initial study• Goal: To identify the dynamics that may indicate

the emergence of a new topic• Approach:

– Integration of Keywords network and Semantic topics network (Klink-2, Osborne et al. @ ISWC 2015)

– Analysis of the evolution in time of sub-networks that will generate new topics vs. a control group of establish topics.

• Debutant group (new topics)• Non-debutant group (established topics)

Preliminary results• My analysis indicates that for Debutant Topics there is an

intense activity between the most co-occurring keywords which would normally be established topics

• My hypothesis is that I can use this understanding for the early detection of new topics on the basis of the activity of established topics

Student’s t-test on the two distributions:• p-value = 2.81*10-83 • null hypothesis can be rejected

Evaluation plan• Quantitative: retrospective analysis and

detection of historical trends• Qualitative: informal feedback from

domain experts, including senior editors and publishers at Springer, on the system suggestions for future trends

Reflections• So far, my initial experiments provided

promising results which confirm the initial hypotheses

• The adoption of semantic technologies has been beneficial to improve these results

Next steps• Analyse dynamics in other networks (e.g.,

authors, communities and venues)• Integration of social media data

early detection and forecasting of research trends

Science