pids in data infrastructures

12
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands PIDs in Data Infrastructures Peter Wittenburg CLARIN Research Infrastructure EUDAT Data Infrastructure

Upload: monifa

Post on 02-Feb-2016

31 views

Category:

Documents


0 download

DESCRIPTION

PIDs in Data Infrastructures. Peter Wittenburg CLARIN Research Infrastructure EUDAT Data Infrastructure. Automatic Workflows. most data is created automatically as part of workflows manual operations are exceptions at data creation time it is not obvious what their future life will be - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: PIDs  in Data Infrastructures

The Language Archive – Max Planck Institute for Psycholinguistics

Nijmegen, The Netherlands

PIDs in Data Infrastructures

Peter WittenburgCLARIN Research Infrastructure

EUDAT Data Infrastructure

Page 2: PIDs  in Data Infrastructures

Automatic Workflows

• most data is created automatically as part of workflowsmanual operations are exceptions

• at data creation time it is not obvious what their future life will be• later association with metadata and PIDs troublesome and costly

• thus immediate generation of metadata and PIDs as part of automatedworkflows

• data resources need to be referable and often citable (published) • need a reliable and highly performing machinery (registration +

resolution) based on stable standards

typically DOIs viaDataCite

typically Handles via EPIC

Page 3: PIDs  in Data Infrastructures

• assume that we have a recording of an extinct language and some annotations that tell us what someone said about medicine etc• researchers create relations that need to be preserved Video Recording

Sound Recording

Annotations

Recording Session

Metadata Record

fromReposit

oryA

fromReposit

oryB

fromReposit

oryC

How long, stable and persistent?

are using Handlesfrom EPIC service

PID usage in our domain

Page 4: PIDs  in Data Infrastructures

Biological and cultural processes have evolved together, in a symbiotic spiral; they are now indissolubly linked, with human survival unlikely without such culturally produced aids as clothing, cooked food, and tools. The twelve original essays collected in this volume take an evolutionary perspective on human culture, examining the emergence of culture in evolution and the underlying role of brain and cognition. The essay authors, all internationally prominent researchers in their fields, draw on the cognitive sciences -- including linguistics, developmental psychology, and cognition -- to develop conceptual and methodological tools for understanding the interaction of culture and genome. They go beyond the "how" -- the questions of behavioral mechanisms -- to address the "why" -- the evolutionary origin of our psychological functioning. What was the "X-factor," the magic ingredient of culture -- the element that took humans out of the general run of mammals and other highly social organisms?

Several essays identify specific behavioral and functional factors that could account for human culture, including the capacity for "mind reading" that underlies social and cultural learning and the nature of morality and inhibitions, while others emphasize multiple partially independent factors -- planning, technology, learning, and language. The X-factor, these essays suggest, is a set of cognitive adaptations for culture.

ePublicationRepository 1

eRessource Repository 2

How long, etc.?Handles from EPIC

PID usage in our domain

Page 5: PIDs  in Data Infrastructures

• let‘s isolate external properties of our data objects and collections and ignore the content (structure, semantics, packaging, etc.) for a moment

Data Object World

originator depositor repository A user

registered DO- data- metadata (Key-MD)- location

handle generator

PIDproperty recordaccess rightstype (from central registry)ROR flagmutable flagtransaction record

repository B

workownership

datametadata(Key-MD)PIDaccess rights

hands-over

requests

depositsvia RAP

requests

stores

maintains

receivesdisseminations

via RAP

replicates

goes back to a paper byKahn & Wilensky, 2006

Page 6: PIDs  in Data Infrastructures

• way how we organize data

• different other variants possible

2 DO flavours in our domain

bit sequence(instance)

metadata

PID

DO access via metadata

access via PID

immediateaccess

?

bit sequence(instance)

metadata

PID

MDO access via metadata

access via PID

search/browseaccess

Page 7: PIDs  in Data Infrastructures

- grouping of related data - large variety of reasons - versions of a DO - presentations of a DO - same interview/experim. - many others - DO part of many collections

collections in our domain (similar to MPEG21 containers, items, sub-items)

bit sequence

metadata(collection)- category 1- category 2...- category N- PID1- PID2...- PID K

PID collection- assoc info

PID1- assoc info

PID2- assoc info

metadata- category 1- category 2...- category N- PID

category 1- assoc info

category 2- assoc info

ISOcat Registry(ISO 12620,

compl. ISO 11179)

PID Registry

Page 8: PIDs  in Data Infrastructures

EUDAT - common services

two major tracks: •understanding data organization & practices in communities•provide first common services after 12 months

Page 9: PIDs  in Data Infrastructures

PID Use V1 in EUDAT Federation

domain X

repository X

DO1

PIDx URLURLyURLzCKSMRights....

domain Y

repository Y

DO1

domain Z

repository Z

DO1

prefx

Page 10: PIDs  in Data Infrastructures

PID Use V2 in EUDAT Federation

domain X

repository X

DO1

PIDx URLRoRHDLCKSMRights....

domain Y

repository Y

DO1

PIDy URLRoRHDLCKSMRights....

domain Z

repository Z

DO1

PIDz URLRoRCKSMRights....

prefx prefzprefy

Page 11: PIDs  in Data Infrastructures

• EPIC (European PID Consortium: CSC, SARA, GWDG, more) • large data centers with national/organizational (MPS) support• applying redundancy schemes (persistence, availability)• reliability, robustness, performance (registration, resolution)• all the same API (agreement on information associated)

• thus PID syntax not crucial but storing /finding information• feasible business model for science • security of administration DB for system• persistent and balanced governance for HS

• need a worldwide registry of agreed information types to feed our „stupid“ machines

EUDAT relying on EPIC + Handles

Page 12: PIDs  in Data Infrastructures

Information types in discussion

• multiple links to resources• checksum• link to metadata• citation metadata• RoR statement • mutability flag• persistency statement • pointers to presentation versions • provenance statement • collection statement • pointer to rights

• (support for parts/fragments) • (actionable PIDs)

- need agreements- need standard APIs

for EUDAT this iscrucial