metadata for digital libraries: a functional approach sandra payette digital library research group...

29
Metadata for Digital Libraries: A Functional Approach Sandra Payette Digital Library Research Group Cornell University [email protected] Cornell Digital Imaging Workshop October 21, 1998

Post on 22-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Metadata for Digital Libraries: A Functional Approach Sandra Payette Digital Library Research Group Cornell University payette@cs.cornell.edu Cornell Digital

Metadata for Digital Libraries:A Functional Approach

Sandra PayetteDigital Library Research Group

Cornell University

[email protected]

Cornell Digital Imaging Workshop

October 21, 1998

Page 2: Metadata for Digital Libraries: A Functional Approach Sandra Payette Digital Library Research Group Cornell University payette@cs.cornell.edu Cornell Digital

Metadata

CREATOR: Plato

TITLE: The Republic

Image 1 cdrom 1Image 2 cdrom 1Image 3 cdrom 2

Image File Storage

Metadata is structured data about data that imposes order on a disordered information universe.

Access Control List

Page 3: Metadata for Digital Libraries: A Functional Approach Sandra Payette Digital Library Research Group Cornell University payette@cs.cornell.edu Cornell Digital

Many Types of Metadata

• Descriptive

• Structural

• Terms and conditions

• Administrative

• Content ratings

• Provenance

• Relationship

Page 4: Metadata for Digital Libraries: A Functional Approach Sandra Payette Digital Library Research Group Cornell University payette@cs.cornell.edu Cornell Digital

Basic Functions We Must Support

• Resource Discovery

• Access and Use

• Preservation and Administration

Page 5: Metadata for Digital Libraries: A Functional Approach Sandra Payette Digital Library Research Group Cornell University payette@cs.cornell.edu Cornell Digital

Resource Discovery:

Focus on Descriptive Metadata

Page 6: Metadata for Digital Libraries: A Functional Approach Sandra Payette Digital Library Research Group Cornell University payette@cs.cornell.edu Cornell Digital

Metadata for Resource Discovery

• Catalogs– OPAC / MARC Records

• Indexes– Structured descriptive records (e.g., Dublin Core)– Abstracts – Full-text surrogates (e.g, via OCR)

Page 7: Metadata for Digital Libraries: A Functional Approach Sandra Payette Digital Library Research Group Cornell University payette@cs.cornell.edu Cornell Digital

Challenges

• Impracticality of large-scale traditional cataloging– time consuming, labor intensive, special skills– limited coverage - only “selected” items

• Problems with resource discovery– full-text indexing ineffective (false hits, irrelevancy,

overload)– full-text approaches not useful for non-textual data

(e.g., audio, video, executable programs)

Page 8: Metadata for Digital Libraries: A Functional Approach Sandra Payette Digital Library Research Group Cornell University payette@cs.cornell.edu Cornell Digital

One Solution:Simple Descriptive Surrogates

• Easy to create

• Applicable across domains

• Applicable for different genre of objects

• Allows interoperability among robots, indexers, and search clients

Page 9: Metadata for Digital Libraries: A Functional Approach Sandra Payette Digital Library Research Group Cornell University payette@cs.cornell.edu Cornell Digital

Dublin Core Element Set

• Good baseline descriptive record

• Can exist along side other specialized metadata

• Common ground for discovery across disparate resources

• No specialized skills required

• Flexibility through qualifiers

Source: http://www.purl.org/Metadata/dublin_core/

Page 10: Metadata for Digital Libraries: A Functional Approach Sandra Payette Digital Library Research Group Cornell University payette@cs.cornell.edu Cornell Digital

Dublin Core : 15 Elements

• Title name given to the work by the author

• Author or Creator person(s) responsible for the intellectual content

• Subject and Keywords the topic of the work, keywords, or formal classification schemes

• Description textual description of the content (abstract, prose describing an image, etc.)

• Publisher the organization making the work available in its present form

• Other Contributor person(s) other than the author who have made significant contributions to the intellectual content

• Date the date the work was made available

• Resource Type category of the resource

• Format Data representation of the resource

• Resource Identifier Unique Identification string (e.g. URL, URN, ISBN...)

• Source object from which this object is derived (if applicable)

• Language language of the intellectual content of the object

• Relation relationship of the object to other objects or collections

• Coverage spatial locations and temporal duration characteristics

• Rights Management a pointer to a copyright notice, a rights management statement, or a rights server.

Page 11: Metadata for Digital Libraries: A Functional Approach Sandra Payette Digital Library Research Group Cornell University payette@cs.cornell.edu Cornell Digital

Dublin Core in HTML META Tags

<html><head><title>Cornell Digital Library Research Group</title><META name="DC.subject" content=”digital library research"><META name="DC.subject" content="networked object description"><META name="DC.publisher" content=”Cornell University"><META name="DC.creator" content=”Lagoze, Carl, [email protected]."><META name="DC.creator" content=”Payette, Sandra, [email protected]."><META name="DC.title" content=”Cornell Digital Library Research Group"><META name="DC.date” content="1998-05-15"><META name="DC.form" scheme="IMT" content="text/html"><META name="DC.language" scheme="ISO639" content="en"><META name="DC.identifier" scheme="URL" content="http://www2.cs.cornell.edu/NCSTRL/CDLRG/cdlrg.htm"></head><IMG SRC="/mydir/mysubdir/mypicture.gif" WIDTH=208 HEIGHT=216></html>

Source: http://www.w3.org/TR/REC-html40/

Page 12: Metadata for Digital Libraries: A Functional Approach Sandra Payette Digital Library Research Group Cornell University payette@cs.cornell.edu Cornell Digital

Warwick Framework

• Developed by Dublin Core community

• Broader framework to accommodate diverse metadata schemes

• Encourages community-specific definition and administration of metadata

• Modularity supports interoperability among:– content providers – catalogers and indexers– automated resource discovery systems

Page 13: Metadata for Digital Libraries: A Functional Approach Sandra Payette Digital Library Research Group Cornell University payette@cs.cornell.edu Cornell Digital

Warwick Framework Container

Container

Package

Dublin Core

Package

Other Descriptive

Package

Reference to MARC

Simple Package:Typed Metadata Set

Package

MARC RecordURI

Page 14: Metadata for Digital Libraries: A Functional Approach Sandra Payette Digital Library Research Group Cornell University payette@cs.cornell.edu Cornell Digital

WWW Infrastructure Evolving in this Direction

• Dublin Core submitted to IETF as RFC– ftp://ftp.isi.edu/in-notes/rfc2413.txt

• Resource Description Framework (RDF)– http://www.w3.org/RDF/

• Extensible Markup Language (XML)– http://www.w3.org/XML/

Page 15: Metadata for Digital Libraries: A Functional Approach Sandra Payette Digital Library Research Group Cornell University payette@cs.cornell.edu Cornell Digital

Resource Description Framework (RDF)

• Influenced by the Warwick Framework, among others

• Enables interoperability between applications that exchange metadata

• Mix and match of metadata elements from different schemas

• An application of XML (transfer syntax)

Page 16: Metadata for Digital Libraries: A Functional Approach Sandra Payette Digital Library Research Group Cornell University payette@cs.cornell.edu Cornell Digital

A Simple RDF Model

www2.cs.cornell.edu/CDLRG/doc1

DC:Creator

DC:Publisher

QCSchema:Rating www.xxx.org/rate

A B

MyRating YourRating

Page 17: Metadata for Digital Libraries: A Functional Approach Sandra Payette Digital Library Research Group Cornell University payette@cs.cornell.edu Cornell Digital

RDF Expressed in XML

Dublin Core

Element Set

<?xml:namespace name=“http://www.purl.org/Metadata/dublin_core/” as=“DC”>

<?xml:namespace name=“http://www.w3.org/Schemas/RDF/” as=“RDF”>

<RDF:Serialization><RDF:Assertions href=“http://www2.cs.cornell.edu/CDLRG/doc1”>

<DC:Creator>Sandy Payette</DC:Creator><DC:Publisher>Cornell DLRG </DC:Publisher>

</RDF:Assertions></RDF:Serialization>

Page 18: Metadata for Digital Libraries: A Functional Approach Sandra Payette Digital Library Research Group Cornell University payette@cs.cornell.edu Cornell Digital

RDF: Why is it important?

• Market demand for metadata deployment• Software infrastructure will be ubiquitous (e.g. free in

browsers, servers, proxies, editors, etc.)• RDF is a general purpose framework that provides

structured, human-readable and machine-understandable metadata for the web

• Allows stakeholder communities to independently developed, maintain, and reuse vocabularies

Page 19: Metadata for Digital Libraries: A Functional Approach Sandra Payette Digital Library Research Group Cornell University payette@cs.cornell.edu Cornell Digital

Access and Use

Focus on Structural Metadata

Page 20: Metadata for Digital Libraries: A Functional Approach Sandra Payette Digital Library Research Group Cornell University payette@cs.cornell.edu Cornell Digital

Structural Metadata

• What is it? Data that….– Defines structure within documents– Aggregates images into meaningful entities– Correlates document components to image files– Organizes a collection of objects

• Where is it?– ASCII text files in directories– Relational databases– Embedded in documents or surrogates (e.g. SGML)

Page 21: Metadata for Digital Libraries: A Functional Approach Sandra Payette Digital Library Research Group Cornell University payette@cs.cornell.edu Cornell Digital

First... A Data Model

Data models mirror natural attributes and relationships of real-world objects

PageChapter

TableContents

Index

Front0:1

1:N

0:1

1:N 1:N

1:N

0:1

1:N

Page 22: Metadata for Digital Libraries: A Functional Approach Sandra Payette Digital Library Research Group Cornell University payette@cs.cornell.edu Cornell Digital

“Binding” Document Images with SGML

<!DOCTYPE EBIND PUBLIC "-//UC Berkeley//DTD ebind.dtd (ElectronicBinding (Ebind))//EN" [<!ENTITY % birch PUBLIC "-//UC Berkeley//ENTITIESBirch-tree fairy book (Page Images)//EN">%birch;]><ebind type="book"><front><page><image entityref="birch001" seqno="1" nativeno="i"></page><page><image entityref="birch002" seqno="2" nativeno="ii"></page><page><image entityref="birch003" seqno="3" nativeno="iii"></page><page><image entityref="birch004" seqno="4" nativeno="iv"></page><div0 type="titlepage"><page><image entityref="birch005" seqno="5" nativeno="v"></page><page><image entityref="birch006" seqno="6" nativeno="vi"></page></div0><div0 type="introduction"><head>Introductory note</head><page><image entityref="birch007" seqno="7" nativeno="vii"></page></div0>

Source: http://sunsite.berkeley.edu/Ebind/

Page 23: Metadata for Digital Libraries: A Functional Approach Sandra Payette Digital Library Research Group Cornell University payette@cs.cornell.edu Cornell Digital

Finding Aids in SGML

• Encoded Archival Description (EAD)– SGML mark up of descriptive access tools

(inventories, registers, indexes, and guides)– provides more detail about a collection than in

typical catalog record – facilitates access - “drill down” into collection– potential international standard– maintained jointly by Library of Congress and

Society of American Archivists (SAA)

Source: http://www.loc.gov/rr/ead/eadhome.html

Page 24: Metadata for Digital Libraries: A Functional Approach Sandra Payette Digital Library Research Group Cornell University payette@cs.cornell.edu Cornell Digital

Preservation and Administration

Focus on Administrative Metadata

and Persistent Identifiers

Page 25: Metadata for Digital Libraries: A Functional Approach Sandra Payette Digital Library Research Group Cornell University payette@cs.cornell.edu Cornell Digital

Administrative Metadata

• Information for managing images… over time– relocation– migration (new formats)– copyright tracking– archiving of objects and services

• Where is it?– File headers (to help prevent orphaned images)– External databases (e.g., relational db)– Separate files stored with images

Page 26: Metadata for Digital Libraries: A Functional Approach Sandra Payette Digital Library Research Group Cornell University payette@cs.cornell.edu Cornell Digital

Create a Preservation Audit Trail

Image File Attributes:• formats • versions • compression

Image Attributes:• resolution• bit depth• orientation

Process Data:• creation date/time• equipment used

Rights Management Data:•Expiration dates•Copyright info•source statements

Page 27: Metadata for Digital Libraries: A Functional Approach Sandra Payette Digital Library Research Group Cornell University payette@cs.cornell.edu Cornell Digital

Persistent Identifiers

• Globally unique names

• Persistent … names are permanent, lasting

• Used in resolution services to locate the object (locations change over time).

cnri.dlib/april97-payette

NamingAuthority

ItemName

UniqueIdentifier:

URL: http://www.somewebserver.org/somedirectory/somefile

Page 28: Metadata for Digital Libraries: A Functional Approach Sandra Payette Digital Library Research Group Cornell University payette@cs.cornell.edu Cornell Digital

Identifiers: Current Initiatives

• IETF Uniform Resource Names (URN) – specification of URN framework– requirements for resolution systems– syntax definition

• Existing Systems– CNRI’s Handle System – OCLC PURLs– DOI Initiative

Page 29: Metadata for Digital Libraries: A Functional Approach Sandra Payette Digital Library Research Group Cornell University payette@cs.cornell.edu Cornell Digital

Further reading

• IFLA: A Good List - http://www.nlc-bnc.ca/ifla/II/metadata.htm

• Lynch, et. al.: CNI Resource Discovery White Paper -http://www.cni.org/projects/nidr/nidr.html

• Lagoze: Resource Discovery in the Digital Age -http://www.dlib.org/dlib/june97/06lagoze.html

• Payette: Persistent Identifiers, RLG DigiNews - http://www.rlg.org/preserv/diginews/diginews22.html

• W3C: Metadata Overview - http://www.w3.org/Metadata