Digital Object Identifier
doi>Norman Paskin, International DOI Foundation
Digital Object Identifier
What is DOI?
• A unique identifier for "a piece of content“ on digital networks• Digital object interoperability
doi>
Description by structured
metadata
Resolution by Handle
Numbering scheme
Policies
doi>
• A unique identifier for "a piece of content" in the physical world • single, common system: UPC/EAN Bar Code • components: code writers, readers, policies, etc. • many uses : once assigned, usable by anyone in chain• wide community support made it work• self-sustaining cost recovery model etc.• standard – helps to integrate systems efficiently
Analogy: the physical bar code doi>
"The DOI is the UPC (Bar Code) for objects of intellectual property on the Internet.”
• 1. Uniquely identifies “content” – enables management of transactions of all kinds
• 2. Provides a stable, persistent link – to the content itself or to services
• 3. Can be used to articulate services as real world applications – using metadata, multiple resolution, rules, etc.
What is the DOI? doi>doi>
• Show DOI as combination of components– use existing standards including Handle
• Show examples of services (applications) built on DOI– Examples here web–based
– but DOI applicable to all platforms
This presentation doi>doi>
DOI syntax can include any
existing identifier, formal or informal,
of any entity
• An identifier “container” e.g.• 10.1234/5678• 10.2341/0-7645-4889-1• 10.5678/978-0-7645-4889-4• 10.1000/ISBN 0764548891• 10.1234/Norman_presentation• 10.2224/2003-1-29-CENDI-DOI• etc
Descriptionby structured
metadata
Resolutionby Handle
Numbering scheme
Policies
doi>
Handle resolution allows a DOI to link to
any & multiple piecesof current data
• Resolve from DOI to: – Location (URL) – persistence
• Resolve to multiple data:– Multiple locations– Metadata– Services– Nested DOIs (related objects etc)– Extensible: new types
Descriptionby structured
metadata
Resolutionby Handle
Numbering scheme
Policies
doi>
<indecs> framework:DOI can describe
any form of intellectual property,
at any level of granularity
• Metadata• For interoperability • Kernel metadata
– A standard, interoperable, small set of data
• Able to use existing metadata – Mapped using standard dictionary
• Providing a standard way of accessing and using the object – “Hooks” to Open URL, UDDI, etc– DOI Applications, Services
Descriptionby structured
metadata
Resolutionby Handle
Numbering scheme
Policies
doi>
DOI policies allow any
business model for practical
implementations
• Common rules of the road (IDF)– Governance and agreed scope, policy, rules
• Cost-recovery (self-sustaining)• Registration agencies (cf ISBN, Visa)• Each can develop own applications, services, sector rules,
business model, fees, metadata etc – DOI at cost– DOI free – DOI with other services – etc
Descriptionby structured
metadata
Resolutionby Handle
Numbering scheme
Policies
doi>
<indecs> framework:DOI can describe
any form of intellectual property,
at any level of granularity
Handle resolution allows a DOI to link to
any & multiple piecesof current data
DOI syntax can include any
existing identifier, formal or informal,
of any entity
DOI policies allow any
business model for practical
implementations
extensible
• The combination of components is unique • Aim to use existing standards or, if not available, to develop standards with others
• Numbering: standard principles
(Naming authorities, delegated responsibility, uniqueness, non-intelligent numbering, etc)
• Resolution: DOI is a Handle implementation(Initially single, now multiple resolution; close collaboration with CNRI as technology partner)
• Metadata: indecs framework (Initially <indecs> consortium, now ISO MPEG)
• Policies: based on similar business models(UPC, ISBN, Visa, etc.)
doi>DOI components Description
by structured metadata
Resolutionby Handle
Numbering scheme
Policies
doi>
ActivitytrackingActivitytracking
Full implementation
Full implementation
Initial implementation
Initial implementation
Single redirection (persistent identifier)
Metadata W3C, WIPO, NISO, ISO, MPEG etc.Multiple resolution
A continuing development activity
DOI: development path doi>
• Resolution provides persistence• Easily seen in web applications - DOI
never changes, but URL does:
Persistent identifier doi>doi>
Handle resolution allows a DOI to link to
any & multiple piecesof current data
doi>
Descriptionby structured
metadata
Resolutionby Handle
Numbering scheme
Policies
doi>
Content
URLURL
URL
URL
URL
URL
URL
URL
URL
URL
URL
URL
URL
URL
doi>
Printed identifiers, bookmarks, etc
doi>doi>
404 File not found
Content
URLURL
URL
URL
URL
URL
URL
URL
URL
URL
URL
URL
URL
URL
doi>
"Linkrot": recent estimates 16% in 6 months
doi>doi>
DOIdirectory
URLURL
URL
URL
URL
URL
URL
URL
URL
URL
URL
URL
URL
URL
Content
Content
Assigner
DOIdirectory
DOIdirectory
DOIDOI
DOI
DOI
DOI
DOI
DOIDOI
DOI
DOI
DOI
DOI
DOI
DOI
doi>doi>doi>
Content
DOIDOI
DOI
DOI
DOI
DOI
DOIDOI
DOI
DOI
DOI
DOI
DOI
DOI
DOIdirectory
DOIdirectory
DOIdirectory
DOIdirectory
DOIdirectory
Assigner
DOIdirectoryDOI
directory
DOIdirectoryDOI
directory
Internet
doi>doi>doi>
Assigner
Content
DOIdirectory
DOIDOI
DOI
DOI
DOI
DOI
DOI
DOIDOI
DOI
DOI
DOI
DOI
DOI
DOI
Response Page
•purchase content•view free excerpt•get related items•get add’l metadata•request permissions
Assigner
doi>
More than just "locate"
doi>doi>
Bookstore
Response Page
•purchase content•view free excerpt•get related items•get add’l metadata•request permissions
Assigner
DOIdirectory
•purchase content
DOIDOI
DOI
DOI
DOI
DOI
DOI
DOIDOI
DOI
DOI
DOI
DOI
DOI
DOI
doi>doi>doi>
The “metadata” component doi>doi>
• “Interoperability of data in e-commerce systems”• <indecs> – a multi-partner effort: see
www.indecs.org • Became adopted and now basis of ISO MPEG-21
Dictionary approach: see the paper “Towards a Rights Data Dictionary”
• Unique Identification• Functional Granularity• Designated Authority• Appropriate Access• Metadata as “a relationship between two
entities”
doi>Description
by structured metadata
Resolutionby Handle
Numbering scheme
Policies
doi>
The “metadata” component doi>doi>
• Precision – a consistent extensible framework, for automation
• Terminology – defined scope “content” more precisely– In relation to “Digital Objects”, W3C “resources”, WIPO
“works”:– description by precise attributes, ontology
• Ability to interoperate with any existing metadata – SCORM, MARC, ONIX, etc
• Link to standards work like MPEG, XML
• A way of defining “Application Profiles” – sets of metadata plus rules, a way of grouping DOIs – the basis of applications beyond simple persistence – documentation now being completed
doi>Why this has been important to DOI
<indecs> framework:in DOI can describe
any form of intellectual property,
at any level of granularity
• Text objects (ONIX)• Art objects (CIDOC)• Learning objects (SCORM)• Audio objects (GRID)• Video objects (SMPTE)• etc
Metadata efficiency doi>doi>
<indecs> framework:DOI can describe
any form of intellectual property,
at any level of granularity
• Text objects (ONIX)• Art objects (CIDOC)• Learning objects (SCORM)• Audio objects (GRID)• Video objects (SMPTE) etc
Metadata efficiency doi>
• Common single mapping
doi>
Adding value: services doi>doi>
• Acrobat plug-in as focus example here (web based)
• Four example demonstrations shown here:– Version (provide a dynamic update version of the pdf in
hand)– Multiple resolution (retrieve multiple data: a URL and some
metadata in this case)– CrossRef (retrieve a standard set of metadata and use it in
an application, a citation builder)– Rights (very simple e-commerce interface as an
illustration)
doi>
Tool Bar
Plug-In [ cache ]
doi:10.123/456
Acrobat Reader
Some Service
Another Service
DOI is not visible - within pdf package (like File/Properties in Word, etc)
Buttons "pop up" dynamically as services become available
doi>
Adobe plug-in concept: what
PDF document viewed through Acrobat reader
doi>doi>
Tool Bar
Demo 1 – “get latest version”
Tool Bar
cnri.test.jsn/pdf
TYPE DATA
http://host-4-211/book-newversion.pdfurl
last_modified 2002-06-13T14:06:03-03:00
DOI
Handle Record
2002-06-13T14:06:03-03:00
http://host-4-211/book-newversion.pdf
Internet
Handle System
Demo 1 – “get latest version”
Tool Bar
Demo 1 – “get latest version”
Demo 2 – Multiple Resolution
Demo 2 – Multiple Resolution
Related linksdoi>
Demo 2 – Multiple Resolution
Demo 2 – Multiple Resolution
Tool Bar
Demo 3 –Citation
Tool Bar
Demo 3 –Citation
Tool Bar
Demo 3 –Citation
Tool Bar
Demo 3 –Citation
Tool Bar
Demo 4 – Permissions
Tool BarXMP
Rights button!
Demo 4 – Permissions
Tool Bar
Demo 4 – Permissions
Tool Bar
Demo 4 – Permissions
• Put the DOI data in functional units in the DOI record [Handle]; and the knowledge of what to do with them in the client– Demonstrated with an end-user client (Acrobat) but equally
applicable to middleware– No constraints on adding additional functional units to a given DOI– A common approach – could use same Handle record to manage pdf,
html, mobile, etc., hence efficient in deploying content across platforms
– The resolution to returned metadata through Application Profiles allows complex applications
• Provided a complete packaged solution: numbering, resolution, metadata, policies – On which individual applications and services can be built– The same additional components could be of interest to other Handle
applications: metadata, policies– Avoid reinvention of the wheel
doi>What we have done doi>
10.AP/2 Desc Some description
DOI_Service 10.Service/Metadata; Schema23; http://...
DOI_Service 10.Service/Latest
Created andmaintained byContent Providers
AP (Service Aggregation)DOI to be defined andmaintained byRegistration Agencies
Service DescriptionDOIs to be defined byservice providers.
10.123/456 URL http://www....
DOI_AP 10.AP/2
DOI_ATR 10.ATR/Latest; 22/10/2002
DOI_AP 10.AP/1; KMD; RA; URL
10.Service/Latest Desc Some description
IDL IDL description
Java Java Interface
WSDL Soap Binding
IOR IOR:0001100...
10.Service/Metadata Desc Some description
IDL IDL description
Java Java Interface
WSDL Soap Binding
IOR IOR:0001100...
Handles in DOI doi>
• Several hundred organisations• Several million DOIs
• Examples: • CrossRef • Content Directions Inc
• TSO The Stationery office + others (Europe, US, Asia)
Who is using it now? doi>doi>doi>
• International DOI Foundation (IDF) • Open member organisation, launched 1998• Members; publishing, technology, intermediaries• Modelled on W3C, and on the Bar code
development • www.doi.org
Who has done this? doi>doi>doi>
• Web site at http://www.doi.org• DOI Handbook [http://www.doi.org/hb.html]• DOI news [e mail sign up on site]• DOI FAQs [http://www.doi.org/faq.html]• Metadata:
– Indecs framework [http://www.indecs.org]– “Towards a Rights data Dictionary”
[http://www.doi.org/topics/020522IMI.pdf]
More information? doi>doi>doi>
Digital Object Identifier
doi>
Norman Paskin, International DOI Foundation
• Supplementary material• DOI Application profiles concept• Supporting IDF: benefits• IDF development path • DOI and internet standards
Appendix
DOI Application Profiles h app. profile
Each Profile can be thought of as built from the kernel + extensions:
DOI AP
metadata for application
Compulsory kernel for any DOI
doi>
AP10
Application may be defined in terms of another scheme, e.g. ONIX
DOI TermONIX
doi>Metadata elements h app. profile
AP10
Application defined in terms of another scheme, e.g. ONIX
DOI Term ONIX
doi>
=
AP10
Must have mapping for each element e.g.ONIX “Page” = iid 734 (DOI Term Set)
doi>
doi>
AP10
DOI Term SetONIX
doi>
AP10
DOI users can see metadata as all defined in DOI terms:
doi>
AP10
AP27
The advantage is in additional schemes/mappings:
doi>
AP10
AP27
• Persistent identification – Not just a location – Permanent, trackable, name– Stays the same if ownership, location, control changes– No need to update customers if location changes
• Can incorporate existing identifiers– Standard e.g. ISBN, ISSN, ISMN, SICI, ISRC– Non-standard / public e.g. PII– Private e.g. workflow, internal production – Assigned by the publisher – or on his behalf
• Can interoperate metadata standards – Application profiles, kernel metadata, indecsDD
Benefits of supporting DOIs doi>doi>
• Automated link from DOI to any (and multiple) points – Controlled by the assigner– e.g. Multiple locations; purchase options; additional info;
access control can be made available and controlled globally by the publisher. Can be invoked globally by an intermediary, etc.
• Build your own custom features: entirely extensible architecture
• Generic applicability; any form of intellectual property, any granularity (text, music, audio..)– Simple standard metadata associated with each DOI to
ensure interoperability
• Conforms to, and works with, existing standards
Using DOIs doi>doi>
• Promotes ready use of material in a legal, controllable, manner
• Proven, implemented, real system in use now – e.g. CrossRef: 160+ publishers, around 3 million DOIs per year
since Jan 2001, around 2 million resolutions per month, supports existing businesses
• Demonstrated unique additional features – multiple resolution; DOI-APs– use of these limited only by your imagination
• Low risk– not a proprietary system; available at low cost– controlled by neutral, not-for-profit Foundation with single aim.– built on open standards.– comprehensive effort reduces risk of "dead-end": Asia as well as
EU, US; multimedia e.g. text, music, software
Business benefits doi>doi>
IDF participates in other efforts• W3C, IETF DRM activities• PRISM, ONIX, indecs2…..• ISO TC46, ISO MPEG• NISO, WIPO, etc• Music industry: GRID, CR Forum• Content ID Forum (Japan) • Indecs • TV AnytimeetcNo one company can participate in all these
doi>Leverage other activities doi>
• if this is desirable, it must be paid for • membership supports development until
operating federation takes over• community invests now to get benefit for all • coordinated work to provide efficient operation • ensure consistent deployment and avoid
fragmentation • prevent conflicts and promote efficiency• outreach to other efforts
doi>Why is support needed? doi>
• Ensure the DOI is widely implemented – Existing applications need underpinning of consistent rules,
infrastructure, and wide uptake
• Ensure Content community sets standards – Technology standards are not enough (Napster)– No other existing forum is doing this: W3C, OEBF, MPEG21 etc. all looking
at parts
• DOI results from extensive work by AAP, IPA, STM (1997+) - a consistent development path
• IDF has strong position, and support. – Content and technology communities are represented
• Promote collaboration – interoperate with others; reduce costs, prevent mistakes – provide a common platform but retain ability to build added-value services.
Benefits of supporting IDF doi>doi>
• Cost effective way of gaining access to expertise– Cost is equivalent to 2-3 man days per month of one consultant
(even at highest membership level)– Detailed Monthly briefings on other activities (WIPO, W3c, IETF,
MPEG, ISO, OEBF, SIIA, etc), and more expertise available on request
• Preferential access to business opportunities: – IDF makes connections between members and potential applications:
explore at low risk possible business opportunities– Early access to results of prototypes, plans
• Share cost of development of prototypes– Costs can be shared by participants
• Influence the course of the IDF – participate in working groups, annual meeting, prototypes, board
Benefits of supporting IDF doi>doi>
• An additional business opportunity for some members• Build on the features and acceptance of the system
– build on existing services or offer new services– management of content, management of metadata, etc.
• RAs may build as little or as much as they wish on this– simple assignment, through to a wide range of services
• RAs determine their own fate: – IDF provides federal structure for infrastructure, predictable costs and
governance model – open market structure for applications
• Business opportunity is a shared risk: – DOI service supported by multiple RAs and multiple applications– Shared costs of the infrastructure– common infrastructure encourages common added-value tools
Registration Agencies doi>doi>
ActivitytrackingActivitytracking
Full implementation
Full implementation
Initial implementation
Initial implementation
Single redirection (persistent identifier)
Metadata W3C, WIPO, NISO, ISO, UDDI etc.Multiple resolution
A continuing development activity
DOI: development path doi>
• A number (or “name”)– assign a number to something– (compare: telephone number)
DOI: components
• A number (or “name”)– assign a number to something– (compare: telephone number)
• A description– what the number is assigned to– (compare: directory entry)
DOI: components
• A number (or “name”)– assign a number to something– (compare: telephone number)
• A description– what the number is assigned to– (compare: directory entry)
• An action – make the number do something – (compare: the telephone
system)
DOI: components
• A number (or “name”)– assign a number to something– (compare: telephone number)
• A description– what the number is assigned to– (compare: directory entry)
• An action – make the number do something – (compare: the telephone system)
• Policies– how to get a phone number; billing
(compare: social structures)
DOI: components
“Imagine a country where nobody can identify who owns what, addresses cannot easily be verified, people cannot be made to pay their debts, resources cannot conveniently be turned into money, ownership cannot be divided into shares, descriptions of assets are not standardized and cannot easily be compared, and the rules that govern property vary from neighbourhood to neighbourhood or even street to street. You have just put yourself into the life of a developing country or former communist nation”
doi>Our aim: Building infrastructure
“The Mystery of Capital: Why Capitalism Succeeds in the West and Fails Everywhere Else” by Hernando de Soto (2000)
doi>
“One of the most important things a formal property system does is transform assets from a less accessible condition to a more accessible condition, so that they can do additional work. Unlike physical assets, representations are easily combined, divided, mobilized, and used to stimulate business deals. By uncoupling the economic features of an asset from their rigid, physical state, a representation makes the asset "fungible" - able to be fashioned to suit practically any transaction.”
doi>Our aim: Building infrastructure
“The Mystery of Capital: Why Capitalism Succeeds in the West and Fails Everywhere Else” by Hernando de Soto (2000)
DOI: provide the tools for representations of intellectual property
doi>
• Distinguish two issues:
1. The technical specification of “what is” a URN and a URI
2. What this means for practical implementation
Internet standards: DOI, URN and URL doi>doi>doi>
• See DOI handbook chapter 4– 4.9 DOI as a URI– 4.10 DOI as a URN– equally true of all HDLs – DOIs are HDLs
• Aim: DOIs are persistent across time and unique across network space
• DOIs are URIs (formally draft specification)
• DOIs are URNs (in effect) • URN and URI proponents disagree
– “the URN wars”
1. Internet specs doi>doi>doi>
1. Internet specs doi>
URN URL
URI
Resolution (N2L)
http:// www.w3.org/addressing (But largely from IETF, W3C did not see need for URN)
urn:ftp:gopher:http:
doi>
• IETF formal spec “URI scheme for Digital Object identifier” – Paskin, Norman; Neylon, Eamonn; Hammond, Tony; Sun, Sam; Uniform Resource
Identifier (URI) scheme for Digital Object Identifiers (DOIs); http://www.ietf.org/internet-drafts/draft-paskin-doi-uri-00.txt (February 2002)An abstract specification (uri:doi:)
– Would be doi: (like tel:) [uri: is not part of the uri spec, unlike urn:]
• May be a pure name or de-referenced by any service – The namespace provides its own mechanism
(“Bootstrapping”) • RFC 2396: UTF-8 encoding allows non-Roman characters• On its own, it’s just a specification!• Requires code distribution for any implementation
DOI as URI doi>doi>doi>
• URN is less clear:– Higher level situation muddy– Set of IETF drafts that define URN– Set of registered namespaces (e.g. isbn)
• DOI could be but isn’t- no advantage• Unlike URI, provides a specific DNS-based middle layer (RDS) to find the
appropriate resolution service• Scalability and security questioned; and:
• Little or no resolution implementation – Resolution proposed is one specific way:– NAPTR(Name Authority Pointer) turns urn:hdl:10.1000/1 into
http://hdl.handle.net/10.1000 – Recently DDDS(Dynamic Delegation Discovery System): variant
of NAPTR
DOI as URN doi>doi>doi>
• urn:isbn:123456789 can be defined ; but what does it do over and above isbn:123456789? – neither have a readily available, well known, global, resolution
• What if NAPTR were widely deployed? (5 years on)?
• Some advantage: could redirect from one URL proxy to another– urn:doi to http://dx.doi.org/ redirect to http://dx2.doi.org
• But this is a “regular expression”: not software• And still worries about DNS issues
– “Gratuitous use of DNS”– DNS name servers are widely distributed – inertia– No security of resolution
DOI as URN doi>doi>doi>
• Persistence across time and network space desirable
• Do not want to bet on the URN logic of putting a resolution system in front of resolution systems– Especially the one proposed
• But– DOIs ARE URIs (formally) – DOIs ARE URNs (in effect)
• But: this is not the most important issue!
1. Internet specs doi>doi>doi>
• Irrespective of all this URI/URN specification, DOIs are still needed, still useful, still valid
• A DOI is more than HDL– Adds Policy, business rules, business model– Adds Metadata specifications (cf ISBN, EAN, Visa)
• e.g. Mappings: – Ensures semantic integrity– A technical exercise:– A term is assigned a unique value in the iDD– Given a genealogy and ContextDescription – Other information added– A mapped term becomes part of the dictionary
• Hence will become more useful as it grows– Consensual between the two things being mapped– Painstaking, but once-only– Specialist services requiring intellectual input
2. Practical implementation doi>doi>doi>
• On this topic, see • DOI Handbook Ch. 3.6: Social infrastructure• DOI Handbook Ch. 6: on The Handle System
and using HDL without DOIs • DOI Handbook Ch. 13: on RAs and using DOIs
without RAs
2. Practical implementation doi>doi>doi>