social phrases having impact in altmetrics - sophia

Post on 12-Apr-2017

148 Views

Category:

Data & Analytics

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Social Phrases Having Impact in Altmetrics - SOPHIAAn Altmetrics Analysis Prototype

Brian Davis, John Lonican, Mohan Timilsina, Waqas Khawaja, Conor Hayes

Insight@NUI GalwayFriday Presentation 1st April 2016 Insight@NUIG

Team Insight/Elsevier

1

Dr Conor Hayes, PI John Lonican,

Masters Researcher

Waqas Khawaja, PhDResearcher, Research

AssocKnowledge Discovery Unit (KDU)

Dr Brian Davis,

Co-PI, PM

Unit for Information Mining & Retrieval (UIMR)

Heike VornhagenMasters Researcher

Mohan Timilsina, Masters

Researcher

Team Insight/Elsevier

1

Elsevier Labs/Informetrics Group

Mike Taylor, Research Scientist/Senior

Product Manager, Research Metrics

Dr Lisa Colledge,Director of Research Metrics

Altmetrics

The study and use of research impact / diffusion in online tools and environment [1]

[1] Priem, J., Taraborelli, D., Groth, P., & Neylon, C. (2010). Altmetrics: A manifesto.

4

Shallow linguistic Analysis of government texts in the context of TP Elsevier

Current Altmetric Measures

Following are the general recommendations [1,2]• Views

• Tracking the number of times an article was viewed or • downloade online

• Discussed• Mentions, Likes and Citations of an article on Social Media

(Blogs, Twitter, Google+), or grey sources i.e. Government white papers

• Saved• Article storage and bookmark counts on tools like Mendeley or Zotero

• Cited• Appearance in non-traditional sources, e.g. Wikipedia,

[1] Lin, J., & Fenner, M. (2013). Altmetrics in evolution: defining and redefining the ontology of article-level metrics. Information Standards Quarterly, 25(2), 20.[2] A new framework for altmetrics. (2016). Impactstory. Retrieved from http://blog.impactstory.org/31524247207/

6

Manual Inspection of with respect to• Avian Flu

- Crawled Data (123 MB on disk for 486 items (PDF, HTML) - Consisting of news, policy documents, government literature, information pamphlets

• HPV Vaccine - Small corpus of US legislation with respect to HPV: (21.4 MB on disk) for 75 PDF items

Manual Analysis of Government literature under Elsevier guidance

Summary of Analysis

• Majority of grey literature documents are NOT supported by scientific references.

• very few direct citations OR research were misrepresented as is the case of Tamiflu.

• mentions of Scientist or Institution in the form of direct quotes or paraphrases i.e. Scientist X from University of Y says that ‘’…..”

12

• We extended our methodology beyond manual analysis in attempt to

• discover useful linguistic patterns in grey(government) literature:

1. Used a standard suit of IE tools from the GATE (General Architecture for Text Engineering) framework to conduct linguistic and semantic annotation i.e. Named Entity Extraction

2. Index the Corpus over annotation using GATE MIMIR (Multiparadigm Indexing and Retrieval)

3. Search for linguistic patterns with respect to quotes and references to scientists and institutions.

4. Collect and document all patterns in order to engineer finite state extraction rules to automate/bootstrap this detection process.

Towards Automation

Expanding the Mimir query • Modified automatically each Mimir query for each trigger

term using Wordnet to discover possible synonyms and increase recall of• matching

–Some examples using synonyms are given below with the number of search results

–{Person} (pronounce|) - 5 docs–{Person} (say) - 5 docs–{Person} (said) - 28 docs–{Person} (told)- 5 docs

–1. {Person} (said|told)

–2. according to [2..5] ({Organization}|{Person})

–Returns:that question is profoundly important, Doshi told MedPage Today , because it may offer clues to how the drug works – one of the gaps in knowledge about oseltamivir.

• according to the BMJ article's lead author, Peter Doshi, PhD, of Johns Hopkins University.

 

Example of a Greedy Pattern in Mimir

Academic Organisation

Scientist

Trigger Phrase

Tracking /Coreference

Example Extraction Rule

Used default GATE information Extraction pipeline as base

Added Verb Group (VP) chunker

Wrote custom JAPE (JAVA Annotation Patterns Engine) grammars

Phase: Preprocess Input: Lookup VG Options: control = all Rule: Preprocess1 ( {Lookup.majorType==triggers,Lookup within VG} ):bind --> :bind.Trigger = {rule=Preprocess1, type=reference}

13

Finding Candidate Mentions

Rule:Reference3 ( ({Person} | {Organization})+ {Trigger} {Split} ):bind --> :bind.candidateMention = {rule=Reference3, type=reference}

14

1. Craft extraction rules and custom gazetteers2. Test the grammars.3. Refine grammars depending on accuracy 4. Use the grammars to bootstrap a training corpus 5. Train a hybrid classifier 6. Use linguistic output of the grammars for feature extraction for a

supervised machine learning approach.7. Hence we convert the problem to a sentence classification task for

learning.8. We are now resampling for a larger balanced test and development

sample for training in order to create a gold standard

Ongoing Work

Corpus Collection for building the Sophia demonstrator

Final Corpus is collected from three sources against keyword ‘Avian Influenza’.

Blogs and News are searched from SPINN3R database.

Twitter ignored at request by Elsevier (for now)

Government documents are indexed from related sites using a custom utility that uses Bing search.

16

Source Documents SizeBlogs 79,270 8.91 GB

News 10,036 190.6 MB

Govt 420 8.8 MB

Hetereogenous Graph Analysis in the context of TP Elsevier

Scopus DataInformation on entities - Scientific Authors, publications and their respective affiliate organizations were collected by making multiple requests to the SCOPUS online API[1]. The requests were made using the following Avian Influenza related queries

● avian influenza● bird flu● h5n1● fowl plague● grippe aviaire● avian flu

[1] http://dev.elsevier.com

18

Use of the Scopus DataThe data retrieved through the requests to the Scopus online API was stored in a Neo4j graph as entity nodes with relevant edges. The entities were also indexed using Lucene for use in Sophia.

Using DOI and canonical URL information from paper nodes and from Spinner Web_Entry nodes (news and web blogs), webmention links were created between academic papers and online sources.

19

Data Spinn3r

20

Spinn3rNov 2010 - July 2011

World Wide Web

Mainstream News Weblogs

Graph ModelSpinner Data Model

21

Graph ModelScopus Data Model

22

Graph Model

23

Graph Model Integration of Spinn3r and Scopus graph data (DOI

approach)

24

NewsItem

BlogsItem

WebEntry

WebEntryWebEntry

Scientist

Scientist

Scientist

Organization

Publications

Scopus GraphSpinn3r

Graph

directLink

directLink

directLink

direc

tLin

k

author

author

affiliated

affiliated

affiliated

DOI

Author

hasA

uth

or

DOI

25

Integration of Spinn3r and Scopus graph data (Mention approach)

Graph Model

Example of Mention and DOI in a Mainstream News

26

MENTION

DOI

sciencedaily.com(mainstream news)

Graph Model

27

mention

Graph MetricsMeasuring the centrality of academic entities in a heterogenous graph.

● Scientist● Publications● Venues● Organizations/Institutions

28

Measuring the influence of scientist Basic Measure = counting no of mentions.

29

Measuring the influence of scientist Basic Measure = Unweighted Node Count

30

Measuring the influence of scientist PageRank Score Scientist(influence) = 𝚺i

N=1 PRi

31

Measuring the influence of scientist Authority Score Scientist(influence) = 𝚺i

N=1 Authi

32

Measuring the influence of scientist Katz Centrality Scientist(influence) = 𝚺k

∞=1 𝚺j

N=1 ɑk (Ak)ji

33

Measuring the impact of scientist Log Based WeightScientist(influence) = 𝚺i

N=1

log[(indegreei+1)/(outdegreei+1) + 1]

34

Comparison of computed metrics with H-Index

35

Metric p-value(α=0.05)

Significance

Mention Count

1.09e-10 0.35***

Unweighted Node Count

1.38e-14 0.38***

PageRank 3.85e-10 0.34***

Authority Scores

6.44e-08 0.29***

Katz Centrality

1.4e-14 0.42***

Log Based Weight

2.2e-16 0.45***

N=320, ***Highly Significant

Current System

36

SOPHIA – Putting it all together!

SOPHIA – Social Phrases Having impact In Altmetrics

SOPHIA builds altmetric networks of researchers and institutions to develop our understanding of how research outcomes are propagated in society, and to experiment with metrics that quantify the authority, centrality and influence of researchers and institutions for a given topic.

38

Representation: Force Graph

39

Representation: Entity local network

40

Annotated Document Content

41

Representation: Simple Term Cloud

42

Software Demonstration Time!

43

Further informationVisit the project website:http://elsevier.kdu.insight-centre.org/

Video Demo of Sophia version 2:https://www.youtube.com/watch?v=a3voWUXkm9s

44

top related