Primary Data Archiving and Citation in Biomedical Research: a DCIP Progress Report
Tim Clark, PhD
Harvard Medical School & Massachusetts General Hospital
BD2K All Hands Meeting, Bethesda MD
November 29-30, 2016
Background• Reproducibility crisis: Science policy makers & funders concerned.
• Policy studies: Recommend primary data archiving and citation.
• BD2K⇒FAIR: “Facilitate broad use of biomedical digital assets by making them discoverable, accessible and citable”.
• Opportunity: Technologies & recommendations now in place.
• bioCADDIE requirements: Need all the primary data + metadata.
Three Main Reasons to Cite Data
⇒ Better Science
⇒ Re-use & discovery
⇒ Cure Diseases
DCIP Goals
• Facilitate data citation in biomedical research as standard practice w/ common information models.
• Coordinate efforts amongst publishers, repositories, identifier services, bioCADDIE & NIH.
• Support development of the NIH bioCADDIE data discovery index software and ecosystem.
What is DCIP Based On?
• CODATA, National Academies & NIH recommendations.
• Joint Declaration of Data Citation Principles (JDDCP).
• Starr et al. 2015 “Achieving Human and Machine Accessibility of Cited Data” https://doi.org/10.7717/peerj-cs.1.
• Existing & emerging standards e.g. JATS, schema.org, DATS.
• Community participation by publishers, repositories, identifier and metadata services, standards groups.
Approach
• Coordinate early adopter best practices.• Help establish standard benchmark implementations.• Report on lessons learned to the community. • Focus on primary biomedical research data. • Make cited data discoverable and reusable.
Major Expected Outputs
• Publishers: Publisher’s Roadmap. • Repositories: Repositories Roadmap.• Identifiers: Harmonized compact ID resolution.• FAQs: Guidance for common implementations.
4Data
DiscoveryIndices
bioCADDIE
Publishers Roadmap Development
Elsevier
SpringerNature
∙ Leads: Amye Kenall & Helena Cousijn
∙ Participants: Elsevier, SpringerNature, eLife, PLoS, Frontiers, Wylie, et al.
∙ Roadmap: Now in final draft form. To be submitted to Nature Scientific Data.
∙ Implementation: Elsevier has implemented the JDDCP in 1,800 journals based on the Roadmap. SpringerNature plans to follow suit shortly in all their journals. Stay tuned ...
Christian Haselgrove
Ian Fore
Philipe Rocca-Serra
Andy Jenkinson Repository Roadmap Development
Christian Haselgrove
Ian Fore
Philipe Rocca-Serra
Andy Jenkinson Repository Roadmap Development
Leads: Martin Fenner (DataCite), Merce Crosas (Dataverse)
Landing Page Metadata Data Citation Metadata Element
Dublin Core
Schema.org DataCite DATS
Dataset Identifier identifier • @id• Resource• itemid*
identifier identifier
Title title name title title
Creator creator author creator creator
Data repository or archive publisher publisher publisher publisher
Publication Date date datePublished publicationYear date
Version <not defined> version version version
Type type type resourceTypeGeneral
type
* name of ID field depends on schema.org serialization format:@id in JSON-LD, resource in RDFa, and itemid in microdata; * JSON-LD the preferred serialization for schema.org elements.
Landing Page Data Citation Metadata s.b.Human and Machine Readable
Repository Metadata Status
• Required and supplemental metadata defined with alternative vocabularies and serializations specified.
• Backward and forward compatibility modes defined.
• Integration w/ ref. managers (EndNote, Zotero, CSL).
• Roadmap in near-final draft.
• Moving forward: outreach to repositories.
Compact Identifier ResolutionDCIP Identifiers Workshop, June 2, 2016, Harvard University, Cambridge MA
John Kunze (CDL), Niall Beard (Manchester), Tim Clark (Harvard),Nick Juty (EBI), Ian Fore (NIH),Julie McMurry (UCSB), Jeff Grethe (UCSD), Rafa Jimenez (ELIXIR), Sarala Wimalaratne (EBI)
Compact Identifier Resolution
• International collaboration of EBI, CDL, Prefix Commons & bioCADDIE throughout the past year.
• Technical approach for common prefix registry has been agreed and specification document is in near-final draft.
• Implementation is near-complete at both EBI and CDL.
• Extensive ongoing discussions with DataCite.
FAQ / Primer Group
UCSD
CaliforniaDigital Library
• Communicates DCIP outcomes.
• Major Deliverables:
• FAQs for Repositories & Publishers ✔
• Data Citation Primer ✔
• Website in design phase, will aggregate all specifications, roadmaps, and training materials, plus providing ongoing implementation status.
Participants
And you!
Conclusions
• Publisher & repository Roadmap documents, Primer and FAQs have been created & are in final editing state.
• Leading publishers are taking data citation very seriously.
• Elsevier will shortly announce it has implemented JDDCP for 1,800 journals. SpringerNature plan to follow suit soo.
• Compact identifier resolution implemented at CDL & EBI.
• Expanding repository outreach is a key issue for 2017.