persistent identifier services and their metadata by john kunze
TRANSCRIPT
Persistent Identifier Services and their Metadata
J o h n K u n z e C a l i f o r n i a D i g i t a l L i b r a r y
2
Decoding the title persistent identifier services and their metadata || || || things, actions, and descriptors || || || nouns, verbs, and adjectives || preserving and serving scholarly communication around data (Context: scholarly research data)
2
4
An identifier is not a string of characters
An identifier is an association between a string and thing. An association is an opinion asserted by an authority. Example 1: http://allrecipes.com/recipe/sauteed-fiddleheads Example 2: 4CF3-57AB-2481-651D-D53D-Q
4
http://dx.doi.org/10.5072/4CF3-57AB-2481-651D-D53D-Q http://dx.doi.org/10.5240/4CF3-57AB-2481-651D-D53D-Q
5
Identifier schemes (v1) • URL (Uniform Resource Locator)
• the first time poor id management is blamed on syntax • URN (Uniform Resource Name)
• first attempt to correct poor id management with syntax • Handle
• second attempt to correct poor id management with syntax • DOI (Digital Object Identifier)
• third attempt to correct poor id management with syntax • ARK (Archival Resource Key)
• attempt to let id management be queryable (not yet realized)
5
6
Identifier schemes (v2) • URL (Uniform Resource Locator)
• world’s first actionable id, now underlying all other types • URN (Uniform Resource Name)
• open infrastructure, not fully realized globally • Handle
• closed infrastructure, fully realized globally • DOI (Digital Object Identifier)
• CrossRef enforces good id management, DataCite learning • ARK (Archival Resource Key)
• open infrastructure, realized locally and globally
6
7
If DOIs won why talk about non-DOIs?
• Cost • Open access • Changing nature of the DOI • Flexibility
7
8
Types of identifier services
• Repository – parking the bits • Data-aware dissemination
• more than just returning parked bits
• Citation management for end user researchers • Research tracking – measuring use and impact • Identifier creation, management, and resolution
8
9
Many service tools, many APIs Repository Tools
• ArXiv * • Dataverse * • Fedora/Hydra • Dspace * • Eprints • DataONE • Merritt/Stash • figshare • Zenodo
9
Citation Management • Mendeley • Zotero
Metrics and Tracking • Altmetric • Impactstory • Thomson Reuters Data Citation
Index • Elsevier Scopus
10
API concepts
Application Programming Interface (API) • how software talks to a service • unlike a Graphical User Interface (GUI) • more like a Command Line Interface (CLI)
APIs and CLIs use language constructs • Verbs, nouns, and qualifiers are "words”, and • words form commands/requests/responses, • which form scripts and programs.
10
11
APIs are metadata sentences
A command line interface powering an API interaction
11
$ sort mydata > sorted_data $ grep Smith sorted_data Smith, Sally 2014-04-01 406B Wong, Frank 2013-11-28 334 $ wget --user=sam --no-check-certificate \ "https://n2t.net/a/ezid/b?set cost 25.50" status: ok
12
Problem: traditional standardization
• Change by committee is ugly, costly, and slow • Example: Dublin Core, 15 cross-domain terms
12
European Parliament Technology - DG ITEC @ flickr
18
An alternate metadata universe
• Vision: one dictionary, one namespace • All research domains, any part of “metadata speech”
• Names, values, units, relationships, ...
• Search for terms, comment on terms, add terms, edit your terms, API for automated access
• All terms with globally unique persistent identifiers • Available at yamz.net (yet another metadata zoo)
18
19
YAMZ.net dictionary sociology
• Crowd-sourced evolving vernacular terms, stable canonical terms, and deprecated terms
• Use evolving terms depending on your risk tolerance
• Reputation-based (gaming-resistant) voting means strong terms rise, weak terms decline
19
Applying lessons learned from Wikipedia, the Internet-Draft/RFC process, and StackOverflow
20
Summary
• Identifiers are not strings, but associations that break when things are not managed well
• People can forget names because we can google, but APIs need persistent names for automation at scale
• APIs are languages using metadata as “words” • Future API building will focus on vocabulary building
• For example, yamz.net
20
Thank you! [email protected]