co-operation and promotion of information resources in science and technology beijing oct 23 2006...

34
Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International DOI Foundation

Upload: nathan-reese

Post on 27-Mar-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

Co-operation and promotion of Information Resources in Science and Technology

Beijing Oct 23 2006

Norman Paskin

DOI SYSTEM AND ITS APPLICATIONS

International DOI Foundation

Page 2: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

1. Naming (identifying) resources on the internet• The problem• Handles • DOIs

2. Meaning of resources on the internet • Mapping meanings through metadata

3. DOI System • Current position of the DOI system

Outline

Page 3: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

• Assigning an identifier to a referent

• Identifier: unique persistent alphanumeric string (“number”, “name”, “lexical token”) specifying a referent

– Unique: one to many: an identifier specifies one and only one referent (but a referent may have more than one identifier)

– Persistent: once assigned, does not change referent

• Resolution: process by which an identifier is input to a network service which returns its associated referent and/or descriptive information about it (metadata).

• Referent: the object which is identified by the identifier, whether or not resolution returns that object.

• Object: any entity within the scope of the identifier system. – may be abstract, physical or digital, since all these forms of entity are of

relevance in content management (e.g. creations, resources, agreements, people, organisations)

1. Naming

Page 4: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

• First class naming: Digital Object Architecture– “Digital information needs to be a first class citizen in the networked

environment” (Kahn/Wilensky 1995)• First class = one that has an identity independent of any other item

• Handle system– Part of the Digital Object Architecture: a system for persistent naming for

digital objects and other resources on the Internet, and efficiently resolving those names to data

• DOI (Digital Object Identifier) system– One application of the Handle System, which adds to it additional features –

social and technical infrastructure, policies, metadata management.

• Internet – the global information system that is logically linked by a globally unique

address space and communications using TCP/IP and provides high level services layered on these (or successors)

– Not DNS; not the Web (includes P2P, voip, etc)

• DNS: Domain Name System – maps domain names (computer hostnames) to IP addresses.

Naming

Page 5: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

• Granularity: the extent to which a collection of information has been subdivided for purposes of identification (e.g. a collection; a book; tables and figures)– Functional Granularity: it should be possible to identify an entity whenever it

needs to be distinguished

• Precisely what is being named? – The work “Robinson Crusoe”?– The Norton edition of “Robinson Crusoe”? – The pdf version of the Norton edition of…. ?– The pdf version of…held on this server…?– Most digital objects of interest have compound form, simultaneously

embodying several referents– Resolution of an identifier may give the referent, or only metadata; or a

“manifestation”

• Resolution of an identifier– Persistence: “get me the right thing” – Contextual resolution: “get me the thing that is right for me”– Appropriate copy resolution (e.g. OpenURL context-sensitive linking): same

content in different contexts– Full contextual resolution (e.g. DVIA): different content in different contexts

What is being named? Three key problems

Page 6: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

• DNS is current basis of resolution of web-based identifiers – URL: not a first class name; an attribute: a location of a file on the WWW

• specification allows addressing by full path to host ( IP address); rarely used. • if the content of the file is moved, the URL link won't find it ("404 not found", or

manual redirection, or automated redirection which may not persist). • if the content, but not location, of the file is changed, a user may not know this.

– URN: naming convention for the content of files. • Specification independent of technologies; but DNS the only present technique• No widely standardised ways of using this: can't type URNs into browsers except in

certain special circumstances. – URI: collective name for URN and URL schemes.

• Not the basis of other non-web identifiers – e.g. Skype names

• DNS not a good general-purpose name system – Does not meet requirements of first class name + appropriate granularity – Not first class names: all URIs at one location have to be ultimately managed by the

same domain name owner, which makes URLs brittle for any piece of content which could possibly change owners

– No granularity of administration per name by anyone other than a network administrator– URLs are grouped by domain name and then by some hierarchical structure, originally

based on file trees, now possibly unconnected from that but still a hierarchy– problems of security and updating and internationalisation – Potential scalability in the face of new technologies

Resolution

Page 7: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

What is the problem?

• Managing information in the Net over very long periods of time:– centuries or more

• Dealing with very large amounts of information in the Net over time• Information, location(s) and systems may change dramatically over time• Respecting and protecting rights, interests and value• Allow for

– arbitrary types of information systems– dynamic formatting and data typing– interoperability between multiple different information systems– metadata schema to be identified and typed

• Solution to this problem was put forward as Digital Object Architecture (Kahn/Wilensky 1995+) and has been successfully developed and deployed

• Handle System: resolution of unique identifiers– Maps an identifier into “state information” about the Digital Object – Identifiers are known as “Handles”– Format is “prefix/suffix” (e.g. 10.100/1234)– Prefix is unique to a naming authority– Suffix can be any string of bits assigned by that authority– Handle System is a general purpose resolution system

Page 8: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

URL 2 http://a-books.com/….

DLS 9 acme/repository

HS_ADMIN 100 acme.admin/jsmith

XYZ 100111001111012

Handle dataHandle Data type Index

10.123/456 URL 1 http://acme.com/….

Handles resolve to typed data

Page 9: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

• Part of the Digital Object Architecture: www.handle.net (Bob Kahn)• Basic resolution system for Internet: identify objects, not servers.• Optimized for speed, reliability, scaling (compared to DNS) • Open, well-defined protocol and data model (RFC 3650,1,2)

– free protocol; service at cost (non-profit); – freely available to be used as engine underneath other named identifiers.

• Separation of control of the handle and who runs the servers– distributed administration, granularity at the handle level

• Any Unicode character set – China: CNNIC (.CN registrar) has integrated DNS and handle

• All transactions can be secure and certified – own PKI as an option

• Not all data public: individual values within a handle can be private.• No semantics in the identifier• Logically centralized, physically distributed and highly scalable• Does not need DNS, but can work with DNS:

– deployed via tools e.g http proxies, client plug-ins, server software, etc

Handle System

Page 10: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

• Provides infrastructure for application domains, e.g., digital libraries & publishing, network management, id management ...

• Library of Congress• DTIC (Defense Technical Information Center)• IDF (International DOI Foundation)

– CrossRef (scholarly journal consortium)– Office of Publications of the European Community – CAL (Copyright Agency Ltd - Australia)– MEDRA (Multilingual European DOI Registration Agency)– Nielsen BookData (bibliographic data - ISBN)– R.R. Bowker (bibliographic data - ISBN)– German National Library of Science and Technology etc

• NTIS (National Technical Information Service)• D-Space (MIT + HP)• ADL (DoD Advanced Distributed Learning initiative)• Several Digital Library projects (eg ARROW)• In development: Globus Alliance (for GRID computing)

Handle System use

Page 11: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

• Assigned Prefixes– DOI 2028– DSpace 453– Other apps 406

• Handles– DOI 25+ M– Other: additional millions (total per prefix known only to prefix manager;

e.g. LANL adding 600M but privately)

• Global Handle System– Core: three service sites (added locations being considered)– c. 50 million direct resolutions per month – c. 50 million proxy server resolutions

Handle System use

Page 12: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

The DOI System

• DOI (Digital Object Identifier) system: www.doi.org

• Initially developed (1998) from the publishing industry but now wider

• Currently being standardised in ISO (TC46/SC9)• the home of ISBN etc “content identifers”

• One application of the Handle System• adds to it additional features – social and technical infrastructure,

policies, metadata management.

Page 13: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

Data Model for declaring meaning

Naming scheme

and resolution

Policies

doi>

Page 14: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

Naming scheme and resolution

• The Handle System • An identifier “container” e.g.

– 10.1234/NP5678– 10.5678/ISBN-0-7645-4889-4– 10.2224/2004-10-ISO-DOI

• Resolve from DOI to data– Initially resolve to location (URL) – persistence– May be to multiple data:

• Multiple locations• Metadata• Services• Extensible

Data Model f or declaring meaning

Naming scheme and resolution

Policies

doi>

Page 15: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

DOI policies

• Implementation through International DOI Foundation• Not-for-profit body: federation of appointed agencies

– Governance and agreed scope, policy, “rules of the road” – Technical infrastructure: resolution mechanism, proxy servers, mirrors, back-up,

central dictionary, – Social infrastructure: persistence commitments, fall-back procedures, cost-

recovery (self-sustaining), shared use of IDF tools etc• Registration agencies

– Each can develop own applications– Any business model – Use in “own brand” ways appropriate for their community

Data Model f or declaring meaning

Naming scheme and resolution

Policies

doi>

Page 16: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

Data Model for declaring meaning

Data Model f or declaring meaning

Naming scheme and resolution

Policies

doi>

• DOI Data Model = Metadata tools:

–a data dictionary to define

–a grouping mechanism to relate

• Necessary for interoperability

• Able to use existing metadata

–Mapped using a standard dictionary

–Can describe any entity at any level of granularity

• See “DOI and data dictionaries” www.doi.org

Page 17: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

• Assigning metadata to a referent, to enable semantic interoperability – “say what the referent is”– Resolution of an identifier may give the referent, or only metadata; or a

“manifestation”

• Semantic: – Do two identifiers from different schemes actually denote the same referent? – If A says “owner” and B says “owner”, are they referring to the same thing? – If A says “released” and B says “disseminated”, do they mean different

things?

• Interoperability: the ability for identifiers to be used in services outside the direct control of the issuing assigner

– Identifiers assigned in one context may be encountered, and may be re-used, in another place or time - without consulting the assigner. You can’t assume that your assumptions made on assignment will be known to someone else.

• Persistence = interoperability with the future

2. Meaning

Page 18: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

Tools to ensure meaning

• Basis: “Interoperability of Data in E-Commerce Systems” (indecs) : http://www.indecs.org 1998-2000

• Focus: generic intellectual property and how to make data about it interoperable

• Who: EC + groups from the content, author, creator, library, publisher and rights communities

• What: Pioneered a model of event-based metadata as a solution for integrating management of rights.

• Led to: a structured ontology (data dictionary); tools for mapping terms precisley; inference tools etc: – contextual ontology architecture

Page 19: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

Metadata scheme e.g. ONIX

Metadata scheme e.g. LOM

Agreed term-by-term mapping or“Crosswalk”

Page 20: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

Metadata scheme e.g. ONIX

Metadata scheme e.g. LOM

Page 21: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

Metadata scheme e.g. ONIX

Metadata scheme e.g. LOM

Page 22: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

Tools to ensure meaning

“Contextual Ontology” approach is used in:

• ISO MPEG-21 Rights Data Dictionary (http://iso21000-6.net/)

• DOI Data Dictionary (http://www.doi.org )

• DDEX digital data exchange - music industry (http://ddex.net/)

• ONIX: Book industry (+) messaging schemas (www.editeur.org )

• Rightscom’s OntologyX - licensee of output, plus own work on tools (www.rightscom.com )

• Digital Library Federation - communication of licence terms (ERMI: ONIX for licensing terms)

• ACAP: Content Access (http://www.the-acap.org/ )

etc

Page 23: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

3. DOI System in application

DOI System solves the problems of:

• Naming: prerequisite for management of digital information entities• Meaning: prerequisite for enabling digital information entities to interact

And also:

• Building a practical system to do this

Page 24: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

Recent news

Link to archive news

E-mail news alert service

www.doi.org

Page 25: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

Two consistent aims since 1998 doi>

Page 26: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

ActivitytrackingActivitytracking

Full implementation

Full implementation

Initial implementation

Initial implementation

Single redirection (persistent identifier)

Metadata Other efforts, standards, etc

Multiple resolution

A continuing development activity

(1) DOI: development in three tracks

Page 27: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

(2) Creation of an organisation

IDF

M &

cost-reduction development spend

Operating Federation

RA

C

Page 28: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

Cumulative DOI Assigned

Currently 7 RAs: but one dominates doi>

0

5000000

10000000

15000000

20000000

25000000

1997 1998 1999 2000 2001 2002 2003 2004 2005 2006Year

Cumulative DOI Deposits

OPOCETIBNielsenBowkermEDRACAL CrossRef

Page 29: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

Cumulative DOI Prefixes – by RA per year doi>But prefix development improving

0

200

400

600

800

1000

1200

1400

1997 1998 1999 2000 2001 2002 2003 2004 2005 2006Year

Cumulative DOI Prefixes

OPOCETIBNielsenBowkermEDRACALCrossRef

Page 30: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

IDF supported by 24 member organisations – general members (not RAs)

– operational (RA = Registration Agency) members

Year Number of RAs (end year) % of revenues RAs

1999 0 0

2000 1 <10

2001 3 20

2002 6 37

2003 7 47

2004 9 60

2005 7 70

2006 Forecast 7 67

Increase in RA role doi>

Page 31: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

• Focus on enabling current RAs to generate more DOIs• New RAs in new areas • Social infrastructure development (RA policies) • Business model:

doi>Current strategy

IDF

RA

C

Incentive scheme: large discounts per DOI for large numbers of registrations,

e.g. 25% -> 90%+

IDF has no role in this

Page 32: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

• RAs focus on building applications in their existing sectors

• viability of business models • lower costs per DOI (for volume)

• IDF focus on tools for RAs:

• Resolution – e.g. Acrobat plug-in • Multiple resolution: DOI-AP framework• Semantic interoperability: Data Dictionary• Contextual resolution – OpenURL, DVIA

doi>Implementation of strategy

Page 33: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

• DOI system as an ISO standard

• Within ISO TC46 SC9 ISO/TC 46 = "Information and documentation". Subcommittee 9 = "Presentation, identification and description of documents": ISBN, ISSN, ISMN, ISRC, ISAN, V-ISAN, ISWC, ISTC  

• Aim is to codify system by reference to componentsIDF becomes ISO appointed authority for DOI standardISO standard is basis of operating procedures (Handbook)

Sept 06: Working Group reviewsNov 06: Committee Draft

Likely completion 2007 or 2008

doi>ISO standardisation

Page 34: Co-operation and promotion of Information Resources in Science and Technology Beijing Oct 23 2006 Norman Paskin DOI SYSTEM AND ITS APPLICATIONS International

Co-operation and promotion of Information Resources in Science and Technology

Beijing Oct 23 2006

Norman [email protected]

DOI SYSTEM AND ITS APPLICATIONS

International DOI Foundation