early detection and forecasting of research trends
TRANSCRIPT
Early Detection and Forecasting of Research Trends
Angelo Antonio Salatino @angelosalatino
Advisors:Prof. Enrico Motta
Dr. Francesco Osborne
ISWC 2015 – Doctoral Consortium
Problem
Who cares?• Researchers: following the evolution of the
research environment• Academic publishers: promoting up-to-date
and interesting contents• Companies: early intelligence on potentially
important research trends to remain at the forefront of innovation
• Funding bodies: improved understanding of the research landscape
State of the art: Trend detection
• Topic evolution using bibliometric analysis:– Content analysis
• Topics extraction• Main terms in documents
– Citation analysis– Main limitation: cannot detect new trends
early enough in the lifecycle
[Wu et al. 2011, Bolelli et al. 2009, He et al. 2009]
State of the art: Forecasting impact
• Impact based on number of publications and authors associated with topics
• Approaches based on exponential smoothing, simple medium average and machine learning
• Limitations: – These approaches don’t work at embryonic and
early stages– They only use a limited set of data sources
[Budi al. 2012, Jun et al. 2010, Tseng et al. 2009]
Planned approachWider range of data sources: comprehensive knowledge base integrating both scholarly data and social media
Planned approach
– For example, before the Semantic Web emerged explicitly as research area we could identify new interesting dynamics involving authors from different research areas such as knowledge representation, agent systems, hypertext and databases.
– Creation of a model that takes into account all the discovered patterns which may involve different entities (e.g., authors, venues, topics, communities)
Focus on discovering patterns emerging from the research dynamics:
Initial study• Goal: To identify the dynamics that may indicate
the emergence of a new topic• Approach:
– Integration of Keywords network and Semantic topics network (Klink-2, Osborne et al. @ ISWC 2015)
– Analysis of the evolution in time of sub-networks that will generate new topics vs. a control group of establish topics.
• Debutant group (new topics)• Non-debutant group (established topics)
Preliminary results• My analysis indicates that for Debutant Topics there is an
intense activity between the most co-occurring keywords which would normally be established topics
• My hypothesis is that I can use this understanding for the early detection of new topics on the basis of the activity of established topics
Student’s t-test on the two distributions:• p-value = 2.81*10-83 • null hypothesis can be rejected
Evaluation plan• Quantitative: retrospective analysis and
detection of historical trends• Qualitative: informal feedback from
domain experts, including senior editors and publishers at Springer, on the system suggestions for future trends
Reflections• So far, my initial experiments provided
promising results which confirm the initial hypotheses
• The adoption of semantic technologies has been beneficial to improve these results
Next steps• Analyse dynamics in other networks (e.g.,
authors, communities and venues)• Integration of social media data