identification of electronic resources: identifiers and resolution services

20
Identification of Electronic Resources: identifiers and resolution services Juha Hakala Helsinki University Library 2003-01-29

Upload: beau-cherry

Post on 31-Dec-2015

28 views

Category:

Documents


0 download

DESCRIPTION

Identification of Electronic Resources: identifiers and resolution services. Juha Hakala Helsinki University Library 2003-01-29. Background. Rapid growth of electronic publishing has revealed fundamental problems in our existing identifier systems - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Identification of Electronic Resources: identifiers and resolution services

Identification of Electronic Resources: identifiers and

resolution servicesJuha Hakala

Helsinki University Library

2003-01-29

Page 2: Identification of Electronic Resources: identifiers and resolution services

Background

• Rapid growth of electronic publishing has revealed fundamental problems in our existing identifier systems

• Rules of implementation and/or syntax must be changed, new systems developed, and resolution systems utilising the identifiers need to be built

• Every resource must have an identifier!

Page 3: Identification of Electronic Resources: identifiers and resolution services

Scope of the work

• Identifiers for authors (ISADN)

• Identifiers for works (ISTC, ISAN, …)

• Identifiers for manifestations (ISBN,…)

• Identifiers for component parts (SICI, BICI)

• Resolution services (DOI, URN)– Always incorporate an identifier

Page 4: Identification of Electronic Resources: identifiers and resolution services

ISSN

• Capacity of the system is sufficient– 1.5 million out of 10 million IDs in use

• Many open issues– How to implement ISBD(CR) in practice?

– Staffing: nat. ISSN centres need more cataloguers

– Web journals are not stable; 856 versus OpenURL

– Data utilisation problem: the global ISSN database system requires modernisation

• Faster updates from nat. centres, Z39.50 access on-line

Page 5: Identification of Electronic Resources: identifiers and resolution services

ISBN

• Capacity is an issue– We will run out of ISBN’s within a decade

• Rule problem: publishers want to use ISBN also to component parts (not BICI) in order to simplify their systems

• Utilisation problem: there is no global ISBN database (and may never be)

Page 6: Identification of Electronic Resources: identifiers and resolution services

The New ISBN: current plans

• ISO/CD 2108, dated 2003-01-17– For ISO/TC 46 SC 9 /WG 4 meeting 30.-31.1.2003

• Bookland EAN prefix (978) will be added; otherwise the structure remains the same– ISBN-13 978-90-70002-34-3

– -> every old number can be re-used

• Check digit calculated using Modulus 10 algorithm (Mod 11 in ISBN-10; 0-9 + X)

Page 7: Identification of Electronic Resources: identifiers and resolution services

The New ISBN: some cancelled ideas

• Make ISBN ISSN-like dumb number– Enhanced capacity, but reduced usefulness

• Create the global ISBN database– Technically, organisationally and politically

controversial idea

– Instead, national centres will make their data available

• Extend ISBN to 16/25/32 digits– Would have broken the EAN system

Page 8: Identification of Electronic Resources: identifiers and resolution services

National Bibliography Number

• Traditionally: the identifier for records in the national bibliography, if the publication did not have an identifier

• New scope: identifier for (electronic) resources to which no other identifier applies

• Implemented as URNs in order to guarantee global uniqueness

Page 9: Identification of Electronic Resources: identifiers and resolution services

National Bibliography Number: examples

• All implementations based on RFC 3188

• Finnish Web Archive (11.7 million files)– Machine generated ID based on MD-5– urn:nbn:fi:fa<MD-5>

• Koninklijke Bibliotheek’s E-depot– urn:nbn:nl:kb:eDepot-<UNIX time>

Page 10: Identification of Electronic Resources: identifiers and resolution services

Uniform Resource Name

• Internet standard; approved in fall 2002

• Both an identifier and resolution service (mechanism for linking identifier and resource in the Internet)

• Designed to be protocol implement; the current version is built on top of DNS, but infrastructure can be changed

Page 11: Identification of Electronic Resources: identifiers and resolution services

URN: syntax

• Specified in RFC 2141 (1997)

• Three sections, separated by commas– String urn– Namespace identifier (NID)– Namespace specific string– urn:nid:nss

Page 12: Identification of Electronic Resources: identifiers and resolution services

URN: services

• Supply the actual document

• Deliver metadata related to the document

• Pass the list of URLs from which the resource can be found

Page 13: Identification of Electronic Resources: identifiers and resolution services

URN: namespace registration

• Each namespace must be registered as specified in RFC 2611– Registration must contain the proposed NID

(such as “nbn”) and an outline of how the global URN resolver discovery service will function within the namespace

• Registrations are approved as informational or normative RFC’s

Page 14: Identification of Electronic Resources: identifiers and resolution services

Administration of the NBN namespace

• Each national library is allowed to do whatever it wants with its own part of the NBN namespace (as long as the identifiers remain unique and persistent)

• National Library of Finland has assigned some organisations their own sections– Library of Congress could do the same

Page 15: Identification of Electronic Resources: identifiers and resolution services

URN: resolution process

• Based on DNS; there is a resource record which describes the location of the service which can resolve a URNs with given NID/NSS combination

• Complexity of the resolution process varies– ISSN – single database is enough– ISBN – databases of national centres will do– SICI – huge amount of a&I-services needed

Page 16: Identification of Electronic Resources: identifiers and resolution services

URN: some benefits

• No assignment cost

• Trivial to create from existing identifiers– Add a fixed prefix, such as urn:nbn:us:cornell:

• Internet standard; support will gradually be included into the basic tools we use

• Present architecture for resolver discovery service is robust and scalable

Page 17: Identification of Electronic Resources: identifiers and resolution services

URN: some problems

• Someone must pay for the implementation of resolution services in e.g. ILSs

• Commercial publishers prefer DOI• Only a handful of systems have registered

namespaces– E.g. ISSN, ISBN, NBN

• Dumb identifier with multiple resolution services does not fit into the system well (although there may be a cascade of resolvers)

Page 18: Identification of Electronic Resources: identifiers and resolution services

URN versus DOI

• DOI system is a technology, not a standard– Standardisation of DOI syntax is not enough; services

and the practical implementation of the resolution mechanism must also be “fixed”

• Handle system has failed to attract IETF

• In DOI system, anything can be used as an identifier (suffix)

• DOI requires registration of registrants (publishers)

Page 19: Identification of Electronic Resources: identifiers and resolution services

URN versus DOI (2)

• DOI syntax is mandated by the Handle system• Actual DOI implementations are dependent on

HTTP protocol (which will not last forever)• Handle system may become (but is not yet truly)

distributed• DOI has been widely implemented and works OK

– Only one DOI service: retrieval of the resource

Page 20: Identification of Electronic Resources: identifiers and resolution services

URN and ENCompass

• Endeavor has no immediate plans to develop URN resolution service– I.e. mechanism for receiving URN resolution

requests arriving via DNS

• URNs can however be stored in metadata or into documents themselves, and indexed– This means that implementing URN RS should

not be complicated (it was designed to be easy to develop)