making sense of digital identifiers for internet and other online applications: summary of the lita...

6
library staff. Start a web page with progress reports, information, and maybe photographs of how things look. Jean Dickinson N209 Love Library, P.O. Box 880410, University of Nebraska, Lincoln, NE 68588-0410, USA E-mail address: [email protected] (J. Dickinson). Making sense of digital identifiers for Internet and other online applications: summary of the LITA preconference In the keynote address for the morning session, Clifford Lynch, Executive Director of the Coalition for Networked Information (CNI), provided an overview of the importance of and controversies surrounding digital identifiers, observing that the issues go beyond a discussion of standards. Noting that, unlike passive identifiers or citations in a non-digital environment, identifiers in a digital world are “actionable,” that is, they can immediately translate into the thing itself, Lynch went on to review briefly major applications, problems, and standards for digital identifiers. Dividing identifiers broadly into those identifying intellectual content and those focusing on market or commercial distinctions (for example, a hard-back vs. a paper-back edition), Lynch cited several applications of digital identifiers including the creation of bibliographies allowing one to move directly from the citation to the work cited; the development of archival systems allowing the retrieval of content many years into the future; the incorpo- ration of metadata into identifiers; and the creation of stable virtual or logical repositories. Problems cited by Lynch included legal issues, such as those raised by the recent Microsoft/TicketMaster case; multiple frameworks and metadata models needed for different purposes; granularity issues, including the question of the demarcation between identifiers and navigation through the thing being identified; and the need for the creators of digital sites or collections to consider being “linked to,” as well as “linking-to,” addressing such questions as identifying schemes; the identification of individual objects within the site; and the creation of a link-friendly site. Noting the existence of multiple standards for identifiers, Lynch alluded to the difficulties of moving from standards to implementation, given the significant infrastructure and cost issues that must be addressed. Lynch identified three components needed for an operational identifier system on the Internet: an assignment system, either hierarchical or distributed; a resolver system that addresses economic, privacy, and security issues; and reverse look-up services. In conclusion, Lynch raised an additional unresolved issue: the consistency of the identifier when the “same” content may be accessed in several different places, for example, in a publisher database and in aggregator databases. Additional discussion of some of the issues raised in Lynch’s talk may be found online at http://www.arl.org/newsltr/194/ identifier.html [1]. Norman Paskin, Director of the International DOI Foundation (IDF), spoke next on the 280 Conference Reports / Libr. Coll. Acq. & Tech. Serv. 24 (2000) 267–350

Upload: carolyn-larson

Post on 05-Jul-2016

215 views

Category:

Documents


2 download

TRANSCRIPT

library staff. Start a web page with progress reports, information, and maybe photographs ofhow things look.

Jean DickinsonN209 Love Library,

P.O. Box 880410,University of Nebraska,

Lincoln, NE 68588-0410, USAE-mail address: [email protected] (J. Dickinson).

Making sense of digital identifiers for Internet and other online applications:summary of the LITA preconference

In the keynote address for the morning session, Clifford Lynch, Executive Director of theCoalition for Networked Information (CNI), provided an overview of the importance of andcontroversies surrounding digital identifiers, observing that the issues go beyond a discussionof standards. Noting that, unlike passive identifiers or citations in a non-digital environment,identifiers in a digital world are “actionable,” that is, they can immediately translate into thething itself, Lynch went on to review briefly major applications, problems, and standards fordigital identifiers.

Dividing identifiers broadly into those identifying intellectual content and those focusingon market or commercial distinctions (for example, a hard-back vs. a paper-back edition),Lynch cited several applications of digital identifiers including the creation of bibliographiesallowing one to move directly from the citation to the work cited; the development ofarchival systems allowing the retrieval of content many years into the future; the incorpo-ration of metadata into identifiers; and the creation of stable virtual or logical repositories.

Problems cited by Lynch included legal issues, such as those raised by the recentMicrosoft/TicketMaster case; multiple frameworks and metadata models needed for differentpurposes; granularity issues, including the question of the demarcation between identifiersand navigation through the thing being identified; and the need for the creators of digital sitesor collections to consider being “linked to,” as well as “linking-to,” addressing suchquestions as identifying schemes; the identification of individual objects within the site; andthe creation of a link-friendly site.

Noting the existence of multiple standards for identifiers, Lynch alluded to the difficultiesof moving from standards to implementation, given the significant infrastructure and costissues that must be addressed. Lynch identified three components needed for an operationalidentifier system on the Internet: an assignment system, either hierarchical or distributed; aresolver system that addresses economic, privacy, and security issues; and reverse look-upservices. In conclusion, Lynch raised an additional unresolved issue: the consistency of theidentifier when the “same” content may be accessed in several different places, for example,in a publisher database and in aggregator databases. Additional discussion of some of theissues raised in Lynch’s talk may be found online at http://www.arl.org/newsltr/194/identifier.html [1].

Norman Paskin, Director of the International DOI Foundation (IDF), spoke next on the

280 Conference Reports / Libr. Coll. Acq. & Tech. Serv. 24 (2000) 267–350

“The IDF and the Problems of Designating Digital Object Identifiers.” Paskin touched brieflyon nine aspects of digital object identifiers (DOI’s): copyright management; persistence;resolution; scope typology; metadata; business models; applications; development; andcurrent issues. Noting that the IDF (http://www.doi.org) was established by member orga-nizations representing publishers, authors, copyright agencies, the music industry, andtechnology companies to address intellectual property concerns and copyright managementissues in a digital environment, Paskin described the DOI as conforming to the criteria forUniform Resource Names (URN) [2]. As such, the DOI is global, unique, persistent,scalable, capable of legacy support for earlier naming systems, extensible, and independent.Currently the DOI resolves to a single URL, using the Handle System (http://www.handle.net); in the future, it may resolve to multiple URLs representing multiple manifes-tations or data types having the same content or scope (that is, content in which intellectualproperty rights may exist).

Given the possibility of multiple manifestations of the same content, it becomes partic-ularly important that the DOI be bound with accompanying metadata that clarifies what isbeing identified, as for example, a print version or a digital version. At a minimum suchmetadata needs to provide basic information about the entity sufficient for reverse lookup,similar to the relationship between a telephone number and the person/organization to whichit belongs; but such data elements may be considered as only a subset of additional metadatainformation that might be provided for use by other applications. The Interoperability of Datain E-Commerce Systems Project (INDECS http://www.indecs.org) has recently developeddesign principles for metadata that the IDF has endorsed. These principles combine unique-ness with functional granularity; in other words, something needs to be identified only whenthere is a need to distinguish it—“what the DOI identifies is up to the user.” In addition, theINDECS guidelines call for identification of the author of the metadata; application/platformindependence; and appropriate access.

Turning to a DOI business model, Paskin noted that a DOI system adds value, but incurscosts in registration (including metadata declaration); infrastructure (resolution services,scaling, and development); and governance. The aim is cost recovery achieved throughfunding by the registrants. The IDF paradigm calls for outsourcing of the registration andinfrastructure activities, whereas retaining governance by the Foundation, a model similar tothe business model for bar codes. DOI applications mentioned by Paskin included metadatacollection and look-up prototypes, workflow implementations, and rights management.Paskin saw future development in the area of multiple resolutions and standards tracking.Among the issues he cited were implementation of the DOI business model including thefuture financial basis of the DOI system; increased involvement of interested communities;and defining and implementing metadata schemes. Amplification of the points covered byPaskin and references to related material may be found online at http://www.dlib.org/dlib/may99/05paskin.html [3].

Brian Green, Executive Director of Book Industry Communication (BIC), the trade bodyof the UK Publishers Association and EDItEUR, its international counterpart, concluded themorning session, speaking on “The INDECS Project: and the Role of Librarians in IdentifierDevelopment,” in which he stressed the importance of interoperability, noting that we cannotafford to create separate metadata schemes for separate applications. Before discussing the

281Conference Reports / Libr. Coll. Acq. & Tech. Serv. 24 (2000) 267–350

INDECS project, Green described briefly the interest of BIC and EDItEUR in the develop-ment of identifier standards. Stressing that unique identifiers are essential if e-commerce isto be successful, he described the concern of BIC and EDItEUR with identifier standards;EDI implementation guidelines; and the analysis of the typology of rights, important becausea multimedia resource may have many separate elements, each with different rights. Greenposited a broad definition of commerce as any transaction, regardless of whether financialgain is involved; hence, fair use is a transaction involving rights. Questions that must beanswered regarding rights transactions include: what is the object? who is requesting it? andwhat do they want to do with it? Among the concerns that must be addressed is reconcilingthe complexities of creations and rights issues with computer systems that are not good atdealing with ambiguity.

Turning to INDECS, of which EDItEUR is a partner, Green noted that the INDECSproject is a fifteen month old fast-track program. Backed by the book, serials, and recordindustries as well as by creators and societies, INDECS is working to develop standardssupporting network commerce in intellectual property. The project’s deliverables include:developing a generic data model for intellectual property; and, by mapping other dataschemes to that model, helping to establish interoperability among them. Among theinitiatives being undertaken by INDECS is development of guidelines for reconciling thedistinct identifiers for individual and corporate persons (as creators, disseminators, and users)used in different identification systems [including those created by library groups such as theInternational Federation of Libraries Association (IFLA) and industry groups such as theInternational Confederation of Authors and Composers Societies (CISAC)]. Noting thatINDECS is actively collaborating with the Dublin Core Initiative, Green concluded with aplea for libraries and rights owners to work together and urged his audience to get involvedin these issues. Further information on the work of INDECS may be found at its Web site,http://www.indecs.org/; whereas information on EDItEUR and BIC is available from theirrespective Web sites, http://www.editeur.org and http://www.bic.org.uk.

In the keynote address for the afternoon session, “Role of Identifiers in ScholarlyCommunication,” William Y. Arms, Professor of Computer Science, Cornell University, andformer Vice President of the Corporation for National Research Initiatives (CNRI), began hisremarks by noting that many of the issues being considered today were also under discussiontwo years ago. Then, as now, discussions focused on identifiers in collection managementand preservation; the need for identifiers based on content, not location was clear; andpreservation techniques, whether based on replication of bits, migration of content, oremulation of computer systems, all rely on digital IDs.

Reviewing briefly the development of the URN, Arms noted that although there is generalagreement on the characteristics of the URN as outlined by Paskin in the morning session,there have been heated discussions on semantic versus non-semantic naming conventionsand on DNS-based versus separate protocols for URN resolution. He observed that theacceptance of the URN today is much less than might have been expected when the conceptwas introduced five years ago. No application has been sufficiently vital to justify the effortof deploying URN’s broadly.

Arms went on to consider recent discussions on “reference linking,” that is, how to movefrom the information in a standard citation to the thing to which the citation refers, outlining

282 Conference Reports / Libr. Coll. Acq. & Tech. Serv. 24 (2000) 267–350

both local and central models that might be used to achieve resolution. He noted that“selective resolution” was the one attribute that was discussed five years ago. Sometimesreferred to as the “Harvard Problem,” the question of selective resolution arises when thereare many copies of a work with the same identifier. The client may need to select amongthese based on performance, economics (subscription availability), or other user require-ments.

In conclusion, Arms offered some thoughts on the future of digital identifiers, noting thatin his opinion a considerable opportunity was lost by the failure of Netscape to providebrowser support for the URN. Without this support, URN schemes have been slow to gainacceptance whereas specialized name schemes, which satisfy specialized needs but createlong-term problems, have appeared. With regard to interoperability, he suggested thatalthough there are many systems of identifiers, interoperability will be achieved. As to datamodels, the data models in use in today’s digital libraries are ad hoc and special purposeleading to ad hoc systems of identifiers. However, he noted that the movement is towardmore formal data models and cited a number of examples including IFLA’s functionalrequirements for bibliographic records [4], metadata frameworks such as the Warwick [5]and INDECS models [6]; and object models such as those associated with Cornell’s Flexibleand Extensible Digital Object and Repository Architecture (FEDORA) [7] and the Makingof America II Project White Paper [8].

Helen Barsky Atkins, Director, Database Development, Institute for Scientific Informa-tion, followed with a look at real-world issues faced by a publisher in connection withidentifiers in her talk, “Making Digital Links Work in a Commercial Environment,” whichshe noted might appropriately have been subtitled: “Why we don’t use standard identifiersand why that may not be possible in the future.” Atkins focused on three areas in her talk:the process of data capture at ISI; links in the ISI product,Web of Science(an index servicecovering approximately 8,500 journals); and some challenges posed by electronic publishing.

Describing in detail the data capture process used by ISI, Atkins characterized the ISImethod as a purely pragmatic approach, by which ISI creates unique proprietary keys foreach article indexed or referenced in theWeb of Science. These keys are then used to createinternal links to associated bibliographic data from an article’s references to the sourcerecords, and from source records to citing documents and to related records that share oneor more records in common. When keys in the file match, a link is created. As new data areadded to the file, new internal links are created. TheWeb of Sciencealso includes externallinks from bibliographic data to full text articles. In setting up these external links, ISI lookedat a number of standard identifiers including the Publisher Item Identifier (PII), the DigitalObject Identifier (DOI), and the Serial Item and Contribution Identifier (SICI) but concludedthat none were sufficiently developed at that time to incorporate into their product. Instead,external links in theWeb of Scienceare created from information supplied by the publishersincluding bibliographic data, the URL and the identifier used by the publisher for thematerial. From this information, ISI builds its key. In this way, if necessary, separate linksfor the same full text article can be established, both to the publisher Web site and toaggregator Web sites, thus facilitating rights management, with links being turned on or offdepending on a user’s profile.

Among the challenges related to electronic publishing described by Atkins were those

283Conference Reports / Libr. Coll. Acq. & Tech. Serv. 24 (2000) 267–350

arising from the existence of titles appearing in both print and electronic formats. Establish-ing which format should be considered “authoritative,” or most complete, is not alwaysclear-cut and varies from title to title. Links to the full text are dependent on the versionindexed (that is, the version considered “most important” by the publisher); pagination fromthe print version may need to be added later to the electronic version. In closing, Atkins notedthat changes in electronic serials publishing including the appearance of articles in electronicformat before being assigned a specific volume or issue could change the way ISI does itslinking, and opens the possibility that in the future ISI will need to make use of standardidentifiers such as the DOI instead of relying on its own key.

Stuart Weibel, Senior Research Scientist, OCLC, concluded the presentations, speakingon “General Issues Affecting Identifier Systems.” Weibel stressed that “Naming is aboutpolicy; it is not about technology. Technology will not save us; policy will.” Recalling someof the problems and early disagreements in the history of the URN, Weibel suggested theirorigin lay in the difficulties of grafting new technology onto an existing stable infrastructure,as represented by the DNS and the WWW. The result was to leave the URN as a “neutral”orphan, without a “champion.” According to Weibel, URLs have been deficient; but therehas not been “enough pain” to attract development of the URN. With the exception ofPersistent Uniform Resource Locators (PURL) and DOIs, there has not been an agency topush for their development.

Describing the development of the PURL as a response to the gridlock surrounding thecontinued development of the URN, Weibel then provided an overview of the PURLprogram. Stressing that persistence is a function of the commitment of the agency runningPURL, he noted that the PURL represents a “minimalist URN,” making use of existingtechnology and is available free from OCLC. Responding to earlier statements that PURLallows only a single point of resolution, Weibel posited that there could be multipleresolution (from a list); or adaptive resolution, based on various criteria (for example, nearestserver; cheapest server; subscription or membership server; or value-added server) althoughissues of authentication, privacy, and security will need to be addressed. As for the future ofPURL, Weibel stated that PURL will be developed to the extent that there are unfilled needsfor the library community and its constituencies.

Weibel then observed that naming by itself is not sufficient. Associating a name with theresource requires metadata. Issues that need to be resolved before digital identifiers can goforward include responsibility for defining and managing the semantics of metadata and theassociated privacy and security issues. Weibel went on to contrast Dublin Core and INDECS,noting that Dublin Core is a international interdisciplinary open standard for defining coremetadata standards for resource description. INDECS is a project to develop a commonmetadata framework to support e-commerce. There are similarities in the two projects, butINDECS, in addition to the resource discovery elements that form part of the Dublin Core,is also concerned with metadata elements for people and intellectual property agreements.Currently there are efforts underway to harmonize the two, the results of which should bemuch clearer by the end of 1999 [9]. In conclusion, Weibel underscored that the infrastruc-ture for persistent names and supporting metadata must support many business models.There will be many models with requirements that overlap in some cases and conflict with

284 Conference Reports / Libr. Coll. Acq. & Tech. Serv. 24 (2000) 267–350

others, but it is in the interest of all parties to adopt common conventions to support commonrequirements.

Links to the LITA Preconference speakers’ notes and slides, as available, will be placedon the LITA Web site (http://www.lita.org).

References

[1] Lynch C. Identifiers and their role in networked Information Applications. ARL: A Bimonthly Newsletterof Research Library Issues and Actions 1997;194. http: //www.arl.org/newsltr/194/identifier.html.

[2] Sollins K, Masinter L. Informational request for comments: 1737. Functional requirements for uniformresource names. 1994; http://www.es.net/pub/rfcs/rfc1737.txt.

[3] Paskin N. DOI: current status and outlook. D-Lib Magazine 1999;5. http://www.dlib.org/dlib/may99/05paskin.html.

[4] IFLA Study Group on the Functional Requirements for Bibliographic Records. Functional requirements forbibliographic records: final report. Mu¨nchen: Saur, 1998. http://www.ifla.org/VII/s13/frbr/frbr.pdf.

[5] Lagoze C, Lynch CA, Daniel R Jr. The Warwick framework: a container architecture for aggregating setsof metadata. Cornell University Computer Science Technical Report TR96-1593. June 1996. http://cs-tr.cs.cornell.edu/Dienst/UI/2.0/Describe/ncstrl.cornell/TR96-1593.

[6] Godfrey R, Bide M. Metadata model: third published version of INDECS. Model for distribution at July1999 Conference [Draft] July 5, 1999. http://www.indecs.org/pdf/model3.pdf.

[7] Payette S, Lagoze C. Flexible and Extensible Digital Object and Repository Architecture (FEDORA).http://www2.cs.cornell.edu/payette/papers/ECDL98/FEDORA.html.

[8] Making of America II Project White Paper; http://sunsite.berkeley.edu/moa2/wp-v2.html.[9] Bearman D, Rust R, Weibel S, et al. A common model to support interoperable metadata: progress report

on reconciling metadata requirements from the dublin core and INDECS/DOI communities.”D-LibMagazine1999. http://www.dlib.org/dlib/january99/bearman/01bearman.html.

Carolyn LarsonScience, Technology, & Business Division,

Library of Congress,Washington, DC 20540, USA

E-mail address: [email protected] (C. Larson).

Understanding the licensing landscape: highlights of the ACRL preconference

In addition to providing an overview of licensing issues, this program offered specificexamples of negotiation situations and contract language, ways of tracking license agree-ments, and maintaining compliance. A common theme addressed by all speakers was thatlicensing norms are currently evolving in the marketplace, and librarians should assert theirfree market role by negotiating licenses that maintain rights and norms established in theprint realm, and incorporate relevant standards.

The program consisted of three toolkit sessions. The first session, called “Sense andLicensability,” began with Ivy Anderson, Coordinator for Digital Acquisitions, HarvardUniversity Library, who placed the license in a functional and legal framework. Functionally,a license is a business agreement defining an economic or market relationship. Legally, it isa type of contract law, and falls under state jurisdiction. The terms of a license are not

285Conference Reports / Libr. Coll. Acq. & Tech. Serv. 24 (2000) 267–350