mets: an introduction part ii mets mechanisms. what is mets? an xml-based standard for encoding...

62
METS: An Introduction Part II METS Mechanisms

Upload: austin-rice

Post on 22-Dec-2015

224 views

Category:

Documents


3 download

TRANSCRIPT

METS: An IntroductionPart II

METS Mechanisms

What is METS?

• An XML-based standard for encoding “hub” documents for materials whose content is digital. – XML is a markup language like SGML.

– A hub document draws together dispersed but related digital files and content

– METS uses XML to provide a vocabulary and syntax for identifying the digital pieces that together comprise a digital entity, for specifying the location of these pieces, and for expressing the relationships between these digital pieces

What is XML?

• Stands for Extensible Markup Language

• Markup Language like SGML (of which HTML is a flavor)

• Intended to serve many of the same purposes as SGML, only better

XML: Key Vocabulary & Concepts

1. Elements and Attributes

2. Pattern and Instance Documents

3. Namespaces

Elements and Attributes

• XML documents consist of a hierarchically arranged sequence of elements.

• Element consists of:– Element tag delimited by angle brackets. Tag contains:

• Element name• Element attributes: [attribute name]=‘[attribute value]’

– Element content or value. Elements don’t always have value. They may simply have a structural purpose.

– Nested Elements: An element can contain other elements– Element close tag: </[element name]>

Element Example 1<metsHdr CREATEDATE="2001-10-23T00:00:00" >

<agent ROLE="CREATOR">

<name>Rick Beaubien</name>

</agent>

</metsHdr>

Element Example 2<structMap>

<div TYPE=“QUAD15” LABEL="San Francisco Quad">

<fptr FILEID="FID1"/>

<fptr FILEID="FID20"/>

<div TYPE="map" LABEL="1895" DMDID="DM2">

<fptr FILEID="FID2"/>

<fptr FILEID="FID14"/>

<fptr FILEID="FID8"/>

</div>

</div>

</structMap>

Patterns and Instances

• Two main categories of documents in XML– “Pattern” or “rules” document: Specifies the

vocabulary and syntax to which a particular type of XML instance document must adhere

– “Instance” document: • Follows the rules specified in its governing pattern

document

• Uses these rules to instantiate a particular digital entity

Pattern Types• Two main types of Pattern Documents

– DTDs: Document Type Definitioin• Carryover from SGML• DTDs not expressed through XML at all• Example: MOA2.DTD

– Schemas • Can express the patterns or rules governing a particular

document type– Can also just define a set of attributes or elements that are

intended for use in a variety of other contexts

• Schemas are themselves XML documents• Controlled by a DTD• Example: METS.xsd

What do Schemas and DTDs Specify?

• Element level– Names– Namespaces– Sequence/nesting– Data types of content– Required/optional/repeatable– Attributes

• Attribute level– Names– Datatypes– Namespace

XML: Namespaces• Each XML Schema “pattern” document can create

a unique target “Namespace” that will be associated with it.

• Elements/attributes defined in the schema are said to belong to the declared target namespace.

• A schema can reference elements and attributes from external namespaces, and allow them to be used in specific contexts in the instance documents it governs.– Specific elements/attributes from specific namespaces– Any element from any namespace

Incorporating Specific Elements• Schema can provide for use of specific elements or

attributes from external namespaces in specific contexts

• Elements/Attributes from external namespaces must be preceded by a tag identifying the namespace, followed by a “:” Elements from primary namespace may also include a namespace prefix. Example (from METS instance document):

<METS:file ID=“FID1”><METS:FLocat LOCTYPE="URL" xlink:href="http://sunsite.berkeley.edu/brk10a.jpg"/>

</METS:file>

Allowing Any External Element• Schema can provide for use of any element from

any external namespace in specific contexts • Examples (from METS instance documents):

<METS:dmdSec ID="DM2"> <METS:mdWrap MDTYPE="OTHER"> <METS:xmlData> <gdm:gdm> <gdm:title>[Patrick Breen Diary]</gdm:title> <gdm:creator>Breen, Patrick</gdm:creator> </gdm:gdm> </METS:xmlData> </METS:mdWrap></METS:dmdSec>

Allowing Any External Element (cont’d)

<METS:dmdSec ID="DM3"> <METS:mdWrap MDTYPE="OTHER“> <METS:xmlData> <dc:dc> <dc:title>[Patrick Breen Diary]</dc:title> <dc:creator>Breen, Patrick</dc:creator> </dc:dc> </METS:xmlData> </METS:mdWrap></METS:dmdSec>

Intro to XML: Conclusion

• Key vocabulary and concepts:– Building blocks: elements & attributes

– Controls: schemas, dtds & instance documents

– Mix and match: namespaces

• Limits of presentation:– XML presentation very crude & restricted

– Not covered: how to read or create XML Schemas

– Examples all from METS instance documents. We will not look at METS schema. Just what it specifies.

Building a METS Document:The Framework

<METS:mets>

<METS:metsHdr /> Header

<METS:dmdSec /> Descriptive MD

<METS:amdSec /> Administrative MD

<METS:fileSec /> File list

<METS:structMap /> Structural Map

<METS:behaviorSec /> Behavior Section

</METS:mets>

METS Diagrammed

structMap

div

fileSec

fileGrp

file

amdSec

techMDsourceMD

digiprovMDrightsMD

dmdSec

dmdSec

Content

Administrative Md

Structure

Descriptive Md

behaviorSec

behaviorSec

Behavior

Building a METS Document: 5 key aspects

1. Expressing the Structure2. Linking Structure with Content3. Linking Structure with Descriptive

Metadata4. Linking Structure and Content Files with

Administrative metadata5. Not covered: Linking behaviors with

structures.

Building a METS Document:Aspect 1

1. Expressing the Structure a. Key elements:

i. <structMap>: structure is expressed in the context of a <structMap> element.a) <div>

i. Structure expressed through hierarchy of <div> elements

ii. <div> elements can be nested to any depth

Expressing structure: Add <div>s<METS:mets TYPE=“diary” LABEL=“Breen Diary”>

<METS:dmdSec />

<METS:admSec />

<METS:fileSec />

<METS:structMap TYPE=“physical”>

<METS:div ORDER=“1” TYPE=“diary” LABEL=“Breen Diary”>

<METS:div ORDER=“1” TYPE=“page” LABEL=“Page 1” />

<METS:div ORDER=“2” TYPE=“page” LABEL=“Page 2” />

</METS:div>

</METS:structMap>

</METS:mets>

<structMap> Element

• Each <structMap> expresses a structure for the digital entity represented– METS object may contain more than one

<structMap>

• Attributes:– TYPE: logical, physical, or ??– LABEL: clarify purpose of structMap (type of

structure) to user

<div> Element (structMap)

• Each <div> represents a logical or physical segment of the digital entity represented. Root <div> represents entire object.

• Attributes:– ORDER: order among siblings

– ORDERLABEL: string representation of ORDER

– LABEL: identifies div to end user (as part of TOC)

– TYPE: type of division (chapter, page, entry, photograph, etc).

Building a METS object:Aspect 2

2. Linking Structure with Contenta) Key elements and attributes:

i. <fptr>: Links <div>s with <file> element(s) in the <fileSec> via FILEID attribute or via a. <area>: points to segment within a <file>b. <seq> : points to files that must be played in sequencec. <par> : points to files that must be played in parallel.

ii. <mptr>: links <div> with an independent, external METS object via a URI

iii. <file>: Element in the <fileSec> that points to a content file and/or itself contains the file contents. Links to external file viaa. <FLocat>: points via URI to external content file

Linking Structure with Content

structMap

ContentStructure

fileSec

fileGrp

fileFlocat

div

areafptr

mptr

seq

area

areapar

area

area

Linking in Simple Content 1<METS:mets>

<METS:fileSec> <METS:fileGrp VERSDATE=“2000-08-22T06:32:00”> <METS:file ID=“FID3” MIMETYPE=“image/gif”> <METS:Flocat LOCTYPE=“URL” xlink:href=“http:…” /> </METS:file> </METS:fileGrp> </METS:fileSec>

<METS:structMap TYPE=“physical”> <METS:div ORDER=“1” TYPE=“diary” LABEL=“Breen Diary”> <METS:div ORDER=“1” TYPE=“page” LABEL=“Page 1”> <METS:fptr FILEID=“FID3” />

<METS:fptr FILEID=“FID35” />

</METS:div>

<METS:div ORDER=“2” TYPE=“page” LABEL=“Page 2” />

</METS:div> </METS:structMap> </METS:mets>

Linking in Simple Content 2

<METS:fileSec> <METS:fileGrp VERSDATE=“2000-08-22T06:32:00”> <METS:file ID=“FID3” MIMETYPE=“image/gif”> <METS:Flocat LOCTYPE=“URL” xlink:href=“http:…” /> </METS:file> </METS:fileGrp> <METS:fileGrp VERSDATE=“2000-08-22T07:32:00”> <METS:file ID=“FID35” MIMETYPE=“image/jpg”> <METS:Flocat LOCTYPE=“URL” xlink:href=“http:…” /> </METS:file> </METS:fileGrp> </METS:fileSec>

<fptr> Element (structMap.div)

• <div> element will contain an <fptr> element for each available manifestation of the <div>: thumbnail, med-res jpeg, hi-res jpeg, etc

• <fptr> points to associated content <file> or <file>s in the <fileSec>. – in case of simple content points directly to the

associated content <file> via the FILEID attribute

<fileGrp> Element (fileSec)

• <file> elements are organized into <fileGrp> elements representing versions of the content.– Example:

• One <fileGrp> might contain Master tif versions

• One <fileGrp> might contain Thumbnail versions

• One <fileGrp> might contain Medium-res jpg versions

• Attributes– VERSDATE: iso format date/time of creation

<file> Element (fileSec.fileGrp)• <file> element represents a content file• Main attributes:

– ID: required. Means for linking from <fptr> in <div>– MIMETYPE– SEQ– SIZE: in bytes– CREATED: iso format date/time of creation– CHECKSUM: MD5 digest value– OWNERID: primary identifier assigned by owner

• <file> may point to external file (via a <FLocat> element, or contain the actual file contents in Base64 (via a <FContent> element) or both

<FLocat> Element (fileSec.fileGrp.file)

• <FLocat> element points to external content via its xlink:href attribute (as do all METS elements that point to external content)

• Main attributes:– xlink:SimpleLink attributes: xlink:href,

xlink:role, xlink:arcrole, xlink:title, xlink:show, xlink:actuate

– LOCTYPE attribute: specifies the kind of xlink:href provided: URN, URL, PURL, HANDLE, DOI, OTHER

Linking in Complex Content: <area>

<METS:structMap TYPE=“physical”> <METS:div ORDER=“1” TYPE=“diary” LABEL=“Breen Diary”> <METS:div ORDER=“1” TYPE=“page” LABEL=“Page 1”> <METS:fptr FILEID=“FID3” />

<METS:fptr FILEID=“FID35” /> <METS:fptr> <METS:area FILEID=“FID1” BETYPE=“IDREF” BEGIN=“PAGE1” END=“ENDPAGE1” /> </METS:fptr>

</METS:div>

<METS:div ORDER=“2” TYPE=“entry” LABEL=“Nov 21” />

</METS:div> </METS:structMap>

<area> Element (structMap.div.fptr)

• <area> element links a <div> to a segment of a content file

• <area> element provides numerous attributes for specifying an area within a file. These include:– SHAPE (html4 conventions: circ, poly, rect)– COORDS (html4 conventions)– BEGIN – END – BETYPE

• BYTE, IDREF, SMIL, MIDI, SMPTE, TIME, TCF

– EXTENT (duration)– EXTTYPE

• BYTE, SMIL, MIDI, SMPTE, TIME, TCF

Linking in Complex Content: <seq>

<METS:structMap TYPE=“logical”> <METS:div ORDER=“1” TYPE=“diary” LABEL=“Breen Diary”> <METS:div ORDER=“1” TYPE=“entry” LABEL=“Nov 20”> <METS:fptr> <METS:seq> <METS:area FILEID=“FID2” /> <METS:area FILEID=“FID3” /> <METS:area FILEID=“FID4” /> </METS:seq> </METS:fptr> </METS:div> </METS:div></METS:structMap>

<seq> Element (structMap.div.fptr.seq)

• <fptr> element may link to content via a <seq> element

• <seq> element uses multiple <area> elements to identify files or parts of files that must be displayed/played in sequence to express the content of the associated <div>.

Linking in Complex Content: <par> element

<METS:div ORDER=“1” TYPE=“mmDiary” LABEL=“Breen Diary”> <METS:div ORDER=“1” TYPE=“page” LABEL=“Page 1”> <METS:fptr> <METS:par> <METS:area FILEID=“FID2” /> (image file) <METS:area FILEID=“FID33 BETYPE=“TIME” BEGIN=“00:00:00” END=“00:01:00” /> (sound file)

</METS:par> </METS:fptr> </METS:div></METS:div>

<par> Element (structMap.div.fptr)

• <fptr> element may link to content via a <par> element

• <par> element uses multiple <area> elements to identify files or parts of files that must be displayed/played in parallel to express content.

Linking in External METS object: <mptr>

<METS:structMap TYPE=“logical”> <METS:div ORDER=“1” TYPE=“diary” LABEL=“Breen Diary”> <METS:div ORDER=“1” TYPE=“entry” LABEL=“Nov 20”> <METS:fptr FILEID=“FID3”>

<METS:fptr FILEID=“FID35”> <METS:fptr> <METS:area> FILEID=“FID1” BETYPE=“IDREF” BEGIN=“ENTRY1” END=“ENTRYEND1” /> </METS:fptr> </METS:div> <METS:div ORDER=“2” TYPE=“entry” LABEL=“Nov 21” /> … <METS:div ORDER=“35” TYPE=“letter” LABEL=“Letter from …”> <METS:mptr LOCTYPE=“URL” xlink:href=“http://…/l.xml /> </METS:div> </METS:div> </METS:structMap>

<mptr> Element (structMap.div.mptr)

• A <div> in a StructMap may want “pass the baton” to an external METS object

• A <mptr> element is used for this purpose• Main attributes:

– xlink:SimpleLink attributes: xlink:href, xlink:role, xlink:arcrole, xlink:title, xlink:show, xlink:actuate

– LOCTYPE attribute: specifies the kind of xlink:href provided: URN, URL, PURL, HANDLE, DOI, OTHER

Summary: Linking Structure with Content

• Structure is expressed in the <StructMap> through a hierarchy of <divs>

• <div>s are linked to content by means of <fptr> elements and/or <mptr> elements

• Each <fptr> or <mptr> associated with the <div> represents a manifestation of the <div>

Summary: Linking Structure with Content (cont’d)

• <fptr> element may point to content in four ways:– <fptr> may directly point to <file> element in

<FileSec>– <fptr> may contain an <area> element that points to a

segment of a file in the <fileSec>– <fptr> may contain a <seq> element. <seq> element

contains sequence of <area> elements that point to <file>s or segments of <file>s that must be played/displayed in sequence

– <fptr> may contain a <par> element. <par> element contains a sequence of <area> elements that point to <file>s that must be played/displayed in parallel

Summary: Linking Structure with Content (cont’d)

• <mptr> element may point to external METS object.

Building a METS object:Aspect 3

3. Linking Structure with Descriptive Metadata

a) Key elements and attributesi. <div> element in <structMap> may link to one or

more <dmdSec> elements via a DMDID attribute.ii. <dmdSec> may

a. point to external descriptive metadata via a <mdRef> element

b. itself contain descriptive metadata in an <mdWrap> element

Linking Structure with Descriptive Metadata

structMap

div

Structure Descriptive Md

dmdSecmdRef

dmdSecmdWrap

Linking to External Descriptive Metadata: DMDID

<METS:structMap TYPE=“logical”> <METS:div ORDER=“1” TYPE=“diary” LABEL=“Breen Diary”DMDID=“DM1” > <METS:div ORDER=“1” TYPE=“entry” LABEL=“Nov 20”> <METS:fptr FILEID=“FID3” />

<METS:fptr FILEID=“FID35” /> <METS:fptr> <METS:area> FILEID=“FID1” BETYPE=“IDREF” BEGIN=“ENTRY1” END=“ENTRYEND1” /> </METS:fptr>

</METS:div>

<METS:div ORDER=“2” TYPE=“entry” LABEL=“Nov 21” />

</METS:div> </METS:structMap>

Linking to External Descriptive Metadata: <mdRef>

<METS:dmdSec ID=“DM1”> <METS:mdRef LOCTYPE=“URL” MDTYPE=“EAD”

xlink:href=“http://…/breen” LABEL=“Finding Aid”/ ></METS:dmdSec>

<mdRef> Element (dmdSec)• <mdRef> element in the context of the <dmdSec>

points to external descriptive metatadata (finding aid, catalog record)

• <mdRef> element provides numerous attributes for qualifying an md reference:– METS standard linking attributes: xlink:SimpleLink,– LOCTYPE, OTHERLOCTYPE– MIMETYPE – MDTYPE (MARC, EAD, DC)– OTHERMDTYPE– LABEL– XPTR (Xpointer to location within file)

Linking to Internal Descriptive Metadata: DMDID 2

<METS:structMap TYPE=“logical”> <METS:div ORDER=“1” TYPE=“diary” LABEL=“Breen Diary”DMDID=“DM1 DM2” > <METS:div ORDER=“1” TYPE=“entry” LABEL=“Nov 20”> <METS:fptr FILEID=“FID3” />

<METS:fptr FILEID=“FID35” /> <METS:fptr> <METS:area> FILEID=“FID1” BETYPE=“IDREF” BEGIN=“ENTRY1” END=“ENTRYEND1” /> </METS:fptr>

</METS:div>

<METS:div ORDER=“2” TYPE=“entry” LABEL=“Nov 21” />

</METS:div> </METS:structMap>

Linking to External Descriptive Metadata: <mdWrap>

<METS:dmdSec ID=“DM1”> <METS:mdRef LOCTYPE=“URL” MDTYPE=“EAD”

xlink:href=“http://…/breen” LABEL=“Finding Aid”/ ></METS:dmdSec><METS:dmdSec ID=“DM2”> <METS:mdWrap MDTYPE=“OTHER” OTHERMDTYPE=“GDM”> <METS:xmlData> <gdm:gdm> <gdm:core> <gdm:coreDate>1846<gdm:coreDate> <gdm:title>[Patrick Breen Diary…] </gdm:title> </gdm:core: <gdm:creator ROLE=“Author”>Breen, Patrick</creator> </gdm:gdm> </METS:xmlData> </METS:mdWrap></METS:dmdSec>

<mdWrap> Element (dmdSec)• <mdWrap> provides a wrapper for metadata• <mdWrap> may wrap <xmlData> element

containing metadata encoded according to external schema: DC, MARCLITE, GDM, etc.

• <mdWrap> may wrap <binData> element containing base64Binary encoded data

• Attributes:– MIMETYPE– MDTYPE: MARC, EAD, DC, etc– OTHERMDTYPE: if MDTYPE is OTHER– LABEL: for presentation to end user

Summary: Linking Structure with Descriptive Metadata

• <div>s are linked to <dmdSec> elements by means of DMDID attribute containing idref(s).

• <div> at any level of the <structMap> hierarchy may reference a <dmdSec>

• Each <dmdSec> references or contains a discrete unit of descriptive metadata

• A <dmdSec> can (either/both)– reference external md via a <mdRef> element– wrap metadata via am <mdWrap> element:

• xml-encoded md conforming to external schema • base64Binary encoded metadata such as a MARC record

Building a METS object: Aspect 4

4. Linking Structure and Files with Administrative metadata.

a) Key attributes and elements:i. <div> elements in the <structMap> may link to one or more

administrative metadata units via an ADMID attribute.

ii. <file> elements in the <fileSec> may link to one or more administrative metadata units via an ADMID attribute

iii. <amdSec>, <techMD>, <rightsMD>, <sourceMD> and <digiprovMD> elements may a. point to external administrative metadata via a <mdRef>

element

b. themselves contain administrative metadata in an <mdWrap> element.

Linking Structure and Content with Administrative Md

structMap

div

fileSec

fileGrp

file

amdSec

sourceMD

digiprovMD

rightsMD

Content Administrative Md

Structure

techMDmdRef

mdWrap

Linking <div> to Admin Md: Adding ADMID

<METS:structMap TYPE=“logical”> <METS:div ORDER=“1” TYPE=“diary” LABEL=“Breen Diary”DMDID=“DM1 DM2” ADMID=“RM1”> <METS:div ORDER=“1” TYPE=“entry” LABEL=“Nov 20”> <METS:fptr FILEID=“FID3” />

<METS:fptr FILEID=“FID35” /> <METS:fptr> <METS:area> FILEID=“FID1” BETYPE=“IDREF” BEGIN=“ENTRY1” END=“ENTRYEND1” /> </METS:fptr>

</METS:div>

<METS:div ORDER=“2” TYPE=“entry” LABEL=“Nov 21” />

</METS:div> </METS:structMap>

Linking to Administrative Md: Adding <rightsMd>,<mdWrap>

<METS:amdSec> <METS:rightsMD ID=“RM1”> <METS:mdWrap MDTYPE=“OTHER” OTHERMDTYPE=“GAMRIGHTS”> <METS:xmlData> <gamrights:gamrights> <gamrights:copyRest>Copyright has been assigned

to the Bancroft Library.All requests… </gamrights:copyRest> </gamrights:gamrights> </METS:xmlData> </METS:mdWrap> </METS:rightsMD></METS:dmdSec>

<amdSec> Element• <amdSec> expresses administrative metadata

through 4 repeatable elements:– <rightsMD>– <techMD>– <sourceMD>– <digiprovMD>

• Each of these elements expresses admin md via same means as dmdSec expresses descriptive md:– <mdRef>: can point to external metadata– <mdWrap>: wraps metadata internally

• <div>s, <file>s, <fileGrp>s can link to <rightsMD>, <techMD>, <sourceMD>, <digiprovMD> or parent <amdSec> via ADMID.

Linking <file> to Admin MD:Add ADMID

<METS:fileSec>

<METS:fileGrp VERSDATE=“2000-08-22T07:32:00”>

<METS:file ID=“FID55” ADMID=“TM1 SM1” MIMETYPE=“image/tif”>

<METS:Flocat LOCTYPE=“URL” xlink:href=“http:…/x.tif” />

</METS:file>

</METS:fileGrp>

</METS:fileSec>

Linking <file> to Admin MD: <techMD

<METS:amdSec> <METS:techMD ID=“TM1”> <METS:mdWrap MDTYPE=“OTHER” OTHERMDTYPE=“GAMTECH”> <METS:xmlData> <gamtech:gamtech> <gamtech:compression>LZW</gamtech:compression> <gamtech:resolution>800</gamtech:resolution> </gamtech:gamtech> </METS:xmlData> </METS:mdWrap> </METS:techMD></METS:amdSec>

Linking <file> to Admin MD: <sourceMD>

<METS:amdSec> <METS:sourceMD ID=“SM1”> <METS:mdWrap MDTYPE=“OTHER” OTHERMDTYPE=“GAMSOURCE”> <METS:xmlData> <gamsource:gamsource> <gamsource:sourceID>BANC MSS C-E 176 </gamsource:sourceID> <gamsource:orgDimen X=“12” Y=“17” UNIT=“cm” /> </gamsource:gamsource> </METS:xmlData> </METS:mdWrap> </METS:sourceMD></METS:amdSec>

Summary: Linking Structure and files with Admin Metadata

• <div>s are linked to admin md elements by means of ADMID attribute containing idref(s).

• <div> at any level of the <structMap> hierarchy may reference <rightsMD> or other amd element

• <file>s and <fileGrp>s are linked to admin md elements by means of ADMID attribute. May link to <techMD>, <rightsMD>, <sourceMD>, <digiprovMD>, or entire <amdSec>

• Each <techMD>, <rightsMD>, <sourceMD>, <digiprovMD> references or contains a discrete unit of descriptive metadata

Summary: Linking Structure and files with Admin Metadata (cont)• <techMD>, <rightsMD>, <sourceMD>,

<digiprovMD> can (either/both)– reference external md via a <mdRef> element

• <mdRef> uses xlink:SimpleLink attributes to point to external administrative metadata.

– wrap metadata (either/or)

• xml-encoded md conforming to external schema in a <xmlData> element.

• base64Binary encoded metadata in a <binData> element

Building a METS object

1. Expressing the Structure2. Linking Structure with Content3. Linking Structure with Descriptive

Metadata4. Linking Structure and Files with

Administrative metadata5. Not covered: Linking behaviors with

structures.

METS Mechanisms: Conclusion

• METS provides varied and flexible mechanisms for – expressing structure or structures of a digital

entity– linking structure with simple and complex

content– linking structure with descriptive metadata– linking structure and content files with

administrative metadata– linking behaviors with structure and content