ietf bof data set identifier interoperability

16
IETF BOF DSII, July 2012 IETF BOF Data Set Identifier Interoperability Beth Plale Director, Data To Insight Center Indiana University

Upload: morey

Post on 25-Feb-2016

29 views

Category:

Documents


2 download

DESCRIPTION

IETF BOF Data Set Identifier Interoperability. Beth Plale Director , Data To Insight Center Indiana University. The DSII BOF. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: IETF BOF Data Set Identifier Interoperability

IETF BOF DSII, July 2012

IETF BOFData Set Identifier Interoperability

Beth PlaleDirector, Data To Insight Center

Indiana University

Page 2: IETF BOF Data Set Identifier Interoperability

IETF BOF DSII, July 2012

The DSII BOFDiscussion of persistent identifier solutions (part I) and steps to

achieving interoperability among persistent identifiers (part II) for data sets made available on the Internet.

The initial use case: scientific data sets produced by different research teams;

Other use cases: media developed by different sources and combined into a common collection.

This BoF is not intended to form a working group at this session.

Page 3: IETF BOF Data Set Identifier Interoperability

IETF BOF DSII, July 2012

Science Data Deluge• A lot of data being generated is in sciences – through ocean

instruments, air quality sensors, through gene sequencing machines, through climate models …

• Research funding agencies want to see research data from funded efforts be available for reuse: today, and decades into future:

– “The National Science Foundation is committed to the principle that the various forms of data collected with public funds belong in the public domain.”

Data Archiving Policy, NSF Social Behavioral and Economic Sciences

Page 4: IETF BOF Data Set Identifier Interoperability

IETF BOF DSII, July 2012

Problem acute in Long TailPower law graph showing popularity ranking. To right (in yellow) is long tail;To left are few that dominate. Note that areas of both regions are equal.

Page 5: IETF BOF Data Set Identifier Interoperability

IETF BOF DSII, July 2012

Long tail and on-line business• Chris Anderson (Wired 2004) popularized term “long tail”. Has two

complementary ideas:– First that merchandise assortments can grow because goods are not

limited by shelf space, and– Second, that online venues change the demand curve because

consumers value niche products. • These complementary forces result in tail that steadily grows both

longer as more obscure products are made available, but also fatter as consumers discover products better suited to their tastes.

Page 6: IETF BOF Data Set Identifier Interoperability

IETF BOF DSII, July 2012

Long tail and data• Emerging trend in science of inexpensive instrument producing

huge volumes of data. – E.g., Genetic sequencing machine, inexpensive enough for purchase

by a research lab, yet produces Terabytes of data with every run. • Long tail of science and scholarly activity goes beyond simply

project size to encompass set of sub-disciplines who carry out “small or localized science”

• These are researchers whose collective numbers actually account for an enormous amount of data-driven science.

Page 7: IETF BOF Data Set Identifier Interoperability

IETF BOF DSII, July 2012

Key role of Metadata in Science Data• Metadata must be preserved when scientific data is

generated because metadata is ephemeral – Jim Gray• “The management, organization, access, and preservation

of digital data is arguably a ‘grand challenge’ of the information age” - Fran Berman (2008)

• If annotation is left to the scientist, it is not done (U.K. e-Science Core)

• The further the distance between data producer and re-use, the more detailed the metadata that’s required.

Page 8: IETF BOF Data Set Identifier Interoperability

IETF BOF DSII, July 2012

Generalizing to Needs for Tracking “the Object”

• Defn “Objects”: an information resource that could be• Data set• Digital documents • Software• Websites• Physical objects: books, bones, statues, etc.• Intangible objects: chemicals, diseases, vocabulary terms,

performances

Area of largest concern

Page 9: IETF BOF Data Set Identifier Interoperability

IETF BOF DSII, July 2012

Metadata Associated with IdentifierIncludes: Checksums, pointer to metadata, rights information, also:

C: [opens session] C: GET http://ark.nlm.nih.gov/ark:/12025/psbbantu? HTTP/1.1 C: S: HTTP/1.1 200 OK S: <snip>S: erc: S: who: Lederberg, Joshua S: what: Studies of Human Families for Genetic Linkage S: when: 1974 S: where: http://profiles.nlm.nih.gov/BB/A/N/T/U/_/bbantu.pdf S: [closes session]

Page 10: IETF BOF Data Set Identifier Interoperability

IETF BOF DSII, July 2012

Operations performed upon identifiers

• discovery, • data access, • access control, and• logical arrangement. We find cases for all of these operations, implying the

need for multiple identifiers

Page 11: IETF BOF Data Set Identifier Interoperability

IETF BOF DSII, July 2012

Governance and Cost• Where are resolvers/assigners run?• Is distribution model for resolvers scalable to the

levels needed by data object discovery and use?• What organization(s) have long term oversight over

continued existence of resolving/assigning/interoperability services?

Page 12: IETF BOF Data Set Identifier Interoperability

IETF BOF DSII, July 2012

Part II: Data set identifier Interoperability

• Metadata interoperability• Relationship interoperability• Service interoperability

Page 13: IETF BOF Data Set Identifier Interoperability

IETF BOF DSII, July 2012

Metadata Interoperability• One solution: universal implementation of common metadata

scheme for all identifier schemes• Otherwise: mechanisms through which possible to

– Use descriptive metadata associated with one identifier in context of another identifier;

– Aggregate descriptive metadata associated with several different identifiers in single context.

• And do so without loss of semantic value (meaning).

Page 14: IETF BOF Data Set Identifier Interoperability

IETF BOF DSII, July 2012

Relationship Interoperability• Standard mechanisms for expressing relationships between

the objects identified under different identifiers schemes – "The publisher identified with this [standard party

identifier] is the publisher of this journal identified with this ISSN."

• This implies development of standard set of typed relationships between identifiers with well-defined semantics.

Page 15: IETF BOF Data Set Identifier Interoperability

IETF BOF DSII, July 2012

Service Interoperability• The creation of common services:

– "...the use of shared syntax or physical interface for request/response for provision of services and/or data.”

• Types of services might include: • Metadata look up services: user resolves identifier to set of

metadata about object• Identifier discovery services: user with limited set of metadata

can discover identifier or identifiers for that object.

Page 16: IETF BOF Data Set Identifier Interoperability

IETF BOF DSII, July 2012

References• EPIC: European based. Works with Handle

System, http://www.pidconsortium.eu • EZID: long term identifiers made easy, works

with both DataCite DOI and ARK http://n2t.net/ezid

• The ARK Identifier Scheme, Internet-Draft, 2012-04 http://www.ietf.org/internet-drafts/draft-kunze-ark-16.txt

• The Handle System, http://www.handle.net/– Handle System Overview, Nov03 RFC

3650 – Handle System Namespace and Service

Definition, Nov 03 RFC 3651 – Handle System Protocol (v2.1) Nov 03 RFC

3652

• Terminology and Use Cases for Interoperability of Identifier Resolution Systems, Internet Draft, 2012-07https://datatracker.ietf.org/doc/draft-kahn-dsii-id-res-sys/

• On the utility of identification schemes for digital earth science data: an assessment and recommendationshttp://rd.springer.com/article/10.1007/s12145-011-0083-6/fulltext.html

• Identifier Interoperability: A Report on Two Recent ISO Activities, http://www.dlib.org/dlib/april06/paskin/04paskin.html