the metadata [r]evolution: transformative opportunities september 18, 2013 · re-ingest temporal...

37
The Metadata [R]evolution: Transformative Opportunities September 18, 2013 Presented by

Upload: others

Post on 31-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation

The Metadata [R]evolution: Transformative Opportunities September 18, 2013

Presented by

Page 2: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation

Using VIVO, Scopus, and PubMed to disambiguate Weill Cornell authors Paul Albert [email protected] Weill Cornell Medical College

Page 3: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation
Page 4: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation
Page 5: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation
Page 6: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation
Page 7: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation

Original approach for managing faculty publications: rely on researchers or their proxies to manually enter publications.

Page 8: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation

Does this work?

Page 9: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation
Page 10: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation
Page 11: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation
Page 12: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation
Page 13: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation

Researchers’ response to email requesting copy of CV

Page 14: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation

Why don’t our researchers care?

Failing to rigorously maintain an accurate list of publications is a rational choice. Time spent on maintaining publications bears a perceived, but more often real, opportunity cost.

Page 15: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation

Revised approach for managing faculty publications: use data from Scopus and PubMed to maintain profiles for them

Page 16: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation

Our publication ingest workflow 1. Librarian formulates queries. Stores in Google Doc.

Developer queries Scopus API and translates result into XML. Use DOI and PMID to lookup record in PubMed.

2. Combine metadata from both sources as a candidate for ingest.

3. If duplicate, disregard. If new, ingest. 4. Re-ingest temporal data such as citation count.

Page 17: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation

What is ingested from where? Scopus • Full Author Names • Article Title • Journal Title • DOI • PMID (PubMed Identifier) • Date of publication • ISSN • Citation count

PubMed • Abstract • Medical Subject Headings (MeSH) • Funding • PubMed Central Identifier • Status (e.g., in process) • Second ISSN • Language • Journal abbreviation • Publication type

Page 18: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation

A key consideration: will a publication ingest be institution-centric or person-centric?

Page 19: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation
Page 20: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation
Page 21: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation

Query by institution

• Easier to identify hits

• Easier for institutional reporting, especially year to year comparisons

• Assertions of co-author identity can be unclear

Query by person

• More laborious – need an internal source for people

• Often accounts for publications w/ no or incorrect affiliation

• Accounts for previous affiliations

Affiliation ID = “Weill” Author ID = “8256757” x 1300

Page 22: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation

Scopus commits two varieties of disambiguation errors

Splitting - one person, multiple author IDs; relatively easy to recover from

Lumping - multiple people, one author ID; relatively hard to recover from

Page 23: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation
Page 24: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation
Page 25: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation

Ideal – one-to-one relation between

Scopus author ID and person

n=369

Splitting – more than one author

ID per person

n=707

Lumping – more than one person

per author ID

n=86

Both errors n=23

How accurate is Scopus at author disambiguation c. 2013? Gold standard = librarian judgment

Page 26: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation

Two author disambiguation methods against a gold standard

Name query

Scopus

From Johnson et al. Submitted. “Automatic generation of investigator bibliographies for institutional research networking systems.”

Page 27: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation

“Special queries” can compensate for lumping errors

Page 28: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation

Examples of special queries

(AU-ID(7405920800)) AND (AF-ID(60007997) OR AF-ID(60009470) OR AF-ID(60019868))

(AU-ID(7402763146)) AND (AF-ID(60007997) OR AF-ID(60019868) OR AF-ID(60018043) OR AF-ID(60007997) OR AF-ID(60019868) OR AF-ID(100366692) OR AF-ID( 60018043) OR AF-ID(60002339) OR AF-ID(60009343) OR AF-ID(60024541) OR AF-ID(60025843) OR AF-ID(60027565))

Page 29: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation

How can VIVO data address pressing institutional needs in order to strengthen its viability?

Page 30: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation

NIH Open Access policy compliance WCMC authors who have received NIH funding but haven’t deposited pre-prints in PubMed Central receive a nastygram personalized notice.

0.78 0.81 0.84 0.87

0.9

Mar Apr May Jun Jul Aug Sep

Page 31: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation

Co-author network and expertise of arbitrary group of faculty

Page 32: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation

Suggested publications in annual faculty review tool

Page 33: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation

Administrators are avid consumers of institutional data.

Page 34: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation

Administrators want reporting tools (especially about publications) that are:

• Have current data • Easy to use • Allow for sophisticated queries

Page 35: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation

VIVO Dashboard now under development

Page 36: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation

Expertise recommendation tool also under development

Page 37: The Metadata [R]evolution: Transformative Opportunities September 18, 2013 · Re-ingest temporal data such as citation count. What is ingested from where? Scopus ... disambiguation

Acknowledgements

Eliza Chan and Prakash Adekkanattu - developers at Weill Cornell

Don Carpenter and Zeheng Wang - VIVO Dashboard developer

Jie Lin - Expertise Recommendation Tool developer

Drew Wright - publications help and NIH Access Policy compliance