non-marc cataloging standards overview: tei & ead, mods, mets, xml- based marc eric childress...

Post on 28-Dec-2015

226 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Non-MARC Cataloging

Standards Overview:

TEI & EAD, MODS, METS, XML-based MARC

Non-MARC Cataloging

Standards Overview:

TEI & EAD, MODS, METS, XML-based MARCEric Childress

OCLC

Eric Childress

OCLC

                                                                                                             

February 10, 2003OCLC

OverviewOverview

• Fundamentals– Metadata and content – Types of metadata– Document mark-up languages & character encoding

• The Big Picture• Metadata formats:

– MARC– MODS– METS– MIX– TEI – EAD– ONIX

FundamentalsFundamentalsMetadata and content

3333Metadata linked to content object•MARC record with URL for ftp object

2222Metadata separate from content object•Book + catalog card•Book + MARC record

1111Metadata embedded in content object•Title page / CIP•HTML header in HTML document

4444

Metadata embedded and linked•MARC record with URL for HTML document•PDF document linked to DC-XML record

•Aggregation of discrete objects linked to record

FundamentalsFundamentalsTypes of metadata

Administrative metadata:•Data about the metadata

•(e.g. record number)

Descriptive metadata:•Description of the object for discovery and retrieval

•(e.g. Title)

Technical metadata:•Technical characteristics of the object

•(e.g. file size)

FundamentalsFundamentals

Markup languages:– Address the structure of a document– Convey instructions to software that will process text to:

• Index the text for searching• To render the text (e.g., for screen display or print) • Transform the text (e.g., for a voice synthesizer) for some output device(s)

– The markup is generally invisible to end-users

• Extensible Markup Language (XML):– XML is metalanguage: agencies define their own XML to suit their task by

creating Document Type Definitions (DTDs) or XML schema– Data separate from presentation instructions (recorded in a style sheet)– Offers just the right mix of flexibility and structure

Character encoding:– Used for communicating text characters in a computing environment– Hundreds of character encoding standards exist– Character conversion is complex and expensive

• Unicode: – A single, “comprehensive” global encoding standard– Includes characters from scripts of all major modern, most minor, and

selected ancient languages

Markup languages & Character encoding

The Big PictureThe Big PictureStandards in a grid

Rich D

escription

Sim

ple

Des

crip

tion

ItemCollections

Dublin Core

RSLP

OAI set record

TEI

VRA Core

ONIX MARC 8

CSDGM

Library-related standardsLibrary-related standards

• MARC 21 (ISO 2709) MARC 8: – Library metadata communications format based on ISO 2709– Strengths:

• Mature standard• Widely adopted by libraries (U.S., Canada, and beyond)• Large universe of records available• Wide choice of software vendors

– Weaknesses (in the present & future): • Virtually unused outside of libraries • Field and record size limitations• Restricted range of scripts supported (MARC 8 repertoire only)• Limited ability to convey hierarchical & complex relationships, attributes• No ability to embed related objects (e.g., book cover GIF)• Cannot be directly processed by widely-used web applications

• MARC 21 (ISO 2709) Unicode:– MARC 21 with Unicode character encoding– Limited to 16K characters equivalent to MARC 8 repertoire

MARC 21 (ISO 2709)MARC

Library-related standardsLibrary-related standards

MARC 21 and XML:– Library of Congress’ MARCXML:

• LC’s schema provides a lossless conversion of MARC 21 (ISO2709) to XML

• LC’s XML framework positions MARCXML as both an end format and as an intermediate format to non-MARC formats

– Stanford University’s Lane Medical School’s XMLMARC:• Developed before LC’s MARCXML schema • Ignores/simplifies some MARC 21 data

UNIMARC and XML:– Ministère de la culture et de la communication (France),

Board of Research and Technology• BiblioML DTD for converting UNIMARC to XML • Conversion tools in development

MARC and XMLMARC

«  BiblioML »

Library-related standardsLibrary-related standards

• Metadata Object Description Schema (MODS) – Essentially MARC 21 recast in an XML-native framework

• Text-based tags rather than numeric ones, • Selected clusters of related MARC 21 attributes condensed into single MODS

element

– MARC 21 readily converts to MODS, but can’t do a lossless reverse conversion of MODS to MARC 21

• Value of MODS:– A rich, library-metadata-oriented XML metadata schema– Optimized for from-MARC conversion of legacy records– Selectively “improves” some of MARC’s mechanisms for representing

resource type– Well-suited as a metadata format for OAI harvesting– Maintained by the same agency (LC) that maintains MARC 21

• Applications of MODS:– LC planning to convert 100K American Memory records– Minerva project, U of Chicago Press, California Digital Library, others using

or planning to use for records for web sites, e-texts.

MODS

Library-related standardsLibrary-related standards

• Metadata Encoding and Transmission Standard (METS)– Standard for encoding descriptive, administrative, structural, rights and

other data essential for retrieving, preserving, and serving up digital resources

– Six modules (header, descriptive metadata, administrative metadata, file section, structural map, behavior section)

– Header and structural map are required; descriptive, administrative, behavior metadata may reside in METS object or be external.

• Value of METS:– Need for METS identified at DLF metadata experts meetings – varied local

approaches to non-descriptive metadata not scaling well nor supporting interoperability between agencies

– Can be used to collect digital resource metadata for submission to repository, hold metadata in the repository, inform user access applications

• Applications of METS:– LC using for moving images, audio recordings, folk life mixed media

collections– OCLC DPR, RLG, Harvard, National Library of Wales exploring or using for

variety of projects

METS

Library-related standardsLibrary-related standards

• Metadata for Images in XML (MIX)– Collaboration of LC and NISO Technical Metadata for Digital Still

Images Standards Committee– XML schema for a set of technical data elements required to

manage digital image collections– Format for interchange and/or storage of the data specified in the

NISO Draft Standard Data Dictionary: Technical Metadata for Digital Still Images (version 1.2)

– Still in early development and testing phases

• Value of MIX:– Provides a common XML schema for expressing technical data

particular to still and moving digital images– Can be used with other schema such as METS and MODS as part

of a comprehensive approach to managing and preserving digital images

• Applications of MIX:– OCLC DPR, LC, others planning or testing – MIX still in nascent stage of development and testing

MIX

E-text-related standardE-text-related standard

• Text Encoding Initiative (TEI):– For complex markup of literary texts– Both SGML & XML [new] DTDs available– TEI “header” (TEIH) can be used as a descriptive metadata record– Maintenance agency: TEI Consortium

• TEI Consortium has executive offices in Bergen, Norway, and is hosted at four university sites worldwide: the University of Bergen, Brown University, Oxford University, and the University of Virginia

• Consortium maintains “P4” Guidelines for Electronic Text Encoding and Interchange

• Value of TEI:– Designed to meet the needs of scholarly research community (esp.

in the humanities) for a variety of activities including:• Adding in-line academic commentary in e-texts• As an aid to research through supporting special indexing points, etc.

• Applications of TEI:– Widely used by major humanities electronic text collections such as

CETH, UVa e-text center, many others.

TEI

Archives-related standardArchives-related standard

• Encoded Archival Description (EAD)– A format for expressing electronic archival finding aids – Created by LC and the Society of American Archivists (SAA)– EAD DTD (Version 2002) is designed to function as both an SGML

and XML DTD

• Value of EAD: – Effectively an organized presentation of a collection of documents

• EAD header carries metadata for the finding aid• Provides for simple or complex mark-up to support varying levels of

indexing• Well-suited for interweaving narrative with links to specific objects in a

collection (either directly to the object or via a record for the object that may link to the object).

• Applications of EAD:– Conversion of existing paper finding aids to electronic form– Widely used by academic institutions and archives in North America– RLG Archival Resources database host copies of many EADs

EAD

Publishing-related standardPublishing-related standard

• ONIX International (Online Information Exchange):– Standard format for publishers to use to distribute electronic information

about their publications. – XML schema with Unicode encoding– Based on EPICS (EDItEUR Product Information Communication Standards) – Maintenance agency: EDItEUR working with input from the Book Industry

Communication (BIC) and the Book Industry Study Group (BISG)

• Value of ONIX:– Designed to meet needs of publishers, jobbers, retail sellers for

• richer book data online (including cover art)• a common data exchange format that will allow players to be rid of the burden of

costly, custom programming to handle data from individual suppliers

– Offers two levels of richness (level 1 & level 2)

• Applications of ONIX:– Primarily oriented towards jobbers and publishers – Most major players

(Amazon, Baker & Taylor, etc.) now using/supporting – Some interest in implementation in library systems

ONIXONIX

&QuestionsQuestionsAA

nswersnswers

LinksLinks

• MARC 21: http://lcweb.loc.gov/marc/marcdocz.html• MARCXML: http://www.loc.gov/marc/marcxml.html• XMLMARC: http://laneweb.stanford.edu:2380/wiki/medlane/xmlmarc• BiblioML (UNIMARC XML): http://www.culture.fr/BiblioML• MODS: http://www.loc.gov/standards/mods• METS: http://www.loc.gov/standards/mets• MIX: http://www.loc.gov/standards/mix• TEI: http://www.tei-c.org• EAD: http://www.loc.gov/ead• ONIX: http://www.editeur.org/onix.html

Further reading on MARCXML, MODS, METS:“New Metadata Standards for Digital Resources,” Bulletin of the

American Society for Information Science and Technology. Dec/Jan 2003, pp 12-15. http://www.asis.org/Bulletin/Dec-02/ASISTDecJan.pdf

Major emphasis in this presentation

LinksLinks

• SCORM: http://www.adlnet.org/index.cfm?fuseaction=scormabt• RSLP: http://www.ukoln.ac.uk/metadata/rslp• VRA Core: http://www.vraweb.org/vracore3.htm• IMS LOM: http://www.imsglobal.org/metadata• CSDGM: http://www.fgdc.gov/metadata/contstan.html• GEM: http://www.geminfo.org/Workbench• CIMI: http://www.cimi.org/old_site/standards

Also appearing (in Big Picture)

top related