metadata standards and applications 4. metadata syntaxes and containers
TRANSCRIPT
Metadata Standards Metadata Standards and Applicationsand Applications
4. Metadata Syntaxes and 4. Metadata Syntaxes and ContainersContainers
Goals of SessionGoals of Session
Understand the origin of and Understand the origin of and differences between the various differences between the various syntaxes used for encoding syntaxes used for encoding information, including HTML, XML information, including HTML, XML and RDFand RDF
Discover how container formats are Discover how container formats are used for managing digital resources used for managing digital resources and their metadataand their metadata
22Metadata Standards & ApplicationsMetadata Standards & Applications
Overview of SyntaxesOverview of Syntaxes
HTML, XHTML: Hypertext Markup HTML, XHTML: Hypertext Markup Language; eXtensible Hypertext Language; eXtensible Hypertext Markup LanguageMarkup Language
XML: Extensible Markup LanguageXML: Extensible Markup LanguageRDF: Resource Description RDF: Resource Description
FrameworkFramework
33Metadata Standards & ApplicationsMetadata Standards & Applications
HTMLHTML
HyperText Markup LanguageHyperText Markup Language
HTML 4 is the current standardHTML 4 is the current standard
HTML is an SGML (Standard Generalized Markup HTML is an SGML (Standard Generalized Markup Language) application conforming to Language) application conforming to International Standard ISO 8879International Standard ISO 8879
Widely regarded as the standard publishing Widely regarded as the standard publishing language of the World Wide Weblanguage of the World Wide Web
HTML addressed the problem of SGML HTML addressed the problem of SGML complexity by specifying a small set of structural complexity by specifying a small set of structural and semantic tags suitable for authoring and semantic tags suitable for authoring relatively simple documentsrelatively simple documents
44Metadata Standards & ApplicationsMetadata Standards & Applications
XHTMLXHTML
XML-ized version of HTML 4.0, tightens up HTML XML-ized version of HTML 4.0, tightens up HTML to match XML syntaxto match XML syntax
– Requires ending tags, quoted attributes, lower Requires ending tags, quoted attributes, lower case, etc., to conform to XML requirementscase, etc., to conform to XML requirements
XHTML is a W3C specification, redefining HTML XHTML is a W3C specification, redefining HTML as an XML implementation, rather than an SGML as an XML implementation, rather than an SGML implementationimplementation
Imposes requirements that are intended to lead Imposes requirements that are intended to lead to more well-formed, valid XML, easier for to more well-formed, valid XML, easier for browsers to handlebrowsers to handle
55Metadata Standards & ApplicationsMetadata Standards & Applications
<link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" /> <link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" /> <meta name="DC.title" content="Using Dublin Core" /> <meta name="DC.creator" content="Diane Hillmann" /> <meta name="DC.subject" content="documents; Bibliography; Model; meta; Glossary; mark; matching; refinements; XHTML; Controlled; Qualifiers; Hillmann; mixing; encoding; Diane; Issues; Appendix; elements; Simple; Special; element; trademark/service; DCMI; Dublin; pages; Section; Resource; Grammatical; Qualified; XML; Using; Principles; Documents; licensing; OCLC; formal; Usageguide; Roles; Implementing; Contents; Guidelines; Expressing; Table; Syntax; Content; Element; DC.dot; Home; document; Metadata; RDF/XML; Website; metadata; privacy; schemes; liability; profiles; Elements; Copyright; Localization; schemas; HTML/XHTML; Core; Guide; registry; Research; contact; Scope; Projects; languages; Maintenance; Application; available; Internationalization; HTML; Recommended; link; Purpose; Abstract; AskDCMI; Vocabularies; software; Storage; Introduction" /> <meta name="DC.description" content="This document is intended as an entry point for users of Dublin Core. For non-specialists, it will assist them in creating simple descriptive records for information resources (for example, electronic documents). Specialists may find the document a useful point of reference to the documentation of Dublin Core, as it changes and grows." /> <meta name="DC.publisher" content="Dublin Core Metadata Initiative" /> <meta name="DC.type" scheme="DCTERMS.DCMIType" content="Text" /> <meta name="DC.format" content="text/html" /> <meta name="DC.format" content="31250 bytes" /> <meta name="DC.identifier" scheme="DCTERMS.URI" content="http://dublincore.org/documents/usageguide/" />
An XHTML Example
6Metadata Standards & Applications
XMLXML
EExxtensible tensible MMarkup arkup LLanguageanguageA ‘A ‘metamarkup’metamarkup’ languagelanguage: has no : has no
fixed tags or elementsfixed tags or elementsStrict grammar imposes structure Strict grammar imposes structure
designed to be read by machinesdesigned to be read by machinesTwo levels of conformance:Two levels of conformance:
– well-formed--conforms to general grammar well-formed--conforms to general grammar rulesrules
– valid--conforms to particular XML schema or valid--conforms to particular XML schema or DTD (document type definition)DTD (document type definition)
77Metadata Standards & ApplicationsMetadata Standards & Applications
XML is the XML is the lingua franca lingua franca of the Webof the Web
Web pages increasingly use at least XHTMLWeb pages increasingly use at least XHTML Business use for data exchange/messagingBusiness use for data exchange/messaging Family of technologies can be leveragedFamily of technologies can be leveraged
– XML Schema, XSLT, XPath, and XqueryXML Schema, XSLT, XPath, and Xquery Software tools widely available (many open Software tools widely available (many open
source)source)– Storage, editing, parsing, validating, transforming and Storage, editing, parsing, validating, transforming and
publishing XMLpublishing XML Microsoft Office 2003 supports XML as document Microsoft Office 2003 supports XML as document
format (WordML and ExcelML)format (WordML and ExcelML) Web 2.0 applications are based on XMLWeb 2.0 applications are based on XML
88Metadata Standards & ApplicationsMetadata Standards & Applications
An XML Schema May Define:An XML Schema May Define:
What elements may be usedWhat elements may be used Of which typesOf which types Any attributesAny attributes In which orderIn which order Optional or compulsoryOptional or compulsory RepeatabilityRepeatability Sub-elementsSub-elements LogicLogic
99Metadata Standards & ApplicationsMetadata Standards & Applications
Anatomy of an XML RecordAnatomy of an XML Record
XML declaration--prepares the processor to work XML declaration--prepares the processor to work with the documentwith the document
Namespaces (uses xmlns:Namespaces (uses xmlns:prefixprefix and a URI to and a URI to attach a prefix to each element and attribute)attach a prefix to each element and attribute)
– Distinguishes between elements and attributes from Distinguishes between elements and attributes from different vocabularies that might share a name (but different vocabularies that might share a name (but not necessarily a definition) using association with not necessarily a definition) using association with URIs URIs
– Groups all related elements from an application so Groups all related elements from an application so software can deal with themsoftware can deal with them
– The URIs are the standardized bit, not the prefix, The URIs are the standardized bit, not the prefix, and they don’t necessarily lead anywhere useful, and they don’t necessarily lead anywhere useful, even if they look like URLseven if they look like URLs
1010Metadata Standards & ApplicationsMetadata Standards & Applications
XML Anatomy Lesson
<marc:subfield code="a">Metadata in practice /</marc:subfield>
AttributeAttribute
End TagEnd Tag
ContentContentNameName
Start TagStart Tag
11Metadata Standards & Applications
Namespace Anatomy Lesson
xmlns:dc=”http://purl.org/dc/elements/1.1/”
XML NamespaceXML Namespace
Namespace Namespace PrefixPrefix
Namespace Namespace IdentifierIdentifier
12Metadata Standards & Applications
RDFRDF
Resource Description Framework--A Resource Description Framework--A language for describing resources for the language for describing resources for the webweb
Structure based on “triples”Structure based on “triples”Focused on exchange of information Focused on exchange of information
between different kinds of organizations between different kinds of organizations and usagesand usages
Considered an essential part of the Considered an essential part of the Semantic WebSemantic Web
Can be expressed using XMLCan be expressed using XML1313Metadata Standards & ApplicationsMetadata Standards & Applications
Some RDF ConceptsSome RDF Concepts
A A ResourceResource is anything that you want to is anything that you want to describe; it’s most often identified with a URI, describe; it’s most often identified with a URI, such as:such as: http://dublincore.org/documents/usageguide/http://dublincore.org/documents/usageguide/
A A ClassClass is a category; it is a set that comprises is a category; it is a set that comprises individualsindividuals
A A PropertyProperty is a Resource that has a name, such is a Resource that has a name, such as "creator" or "homepage"as "creator" or "homepage"
A A Property valueProperty value is the value of a Property, is the value of a Property, such as ”George Washington" or such as ”George Washington" or ""http://dublincore.orghttp://dublincore.org" (note that a property " (note that a property value can be another resource)value can be another resource)
1414Metadata Standards & ApplicationsMetadata Standards & Applications
RDF StatementsRDF Statements
The combination of a The combination of a ResourceResource, a , a PropertyProperty, , and a and a Property valueProperty value forms a forms a StatementStatement (includes a subject, predicate and object)(includes a subject, predicate and object)
An example An example StatementStatement: "The editor of: "The editor of http://dublincore.org/documents/usageguidehttp://dublincore.org/documents/usageguide/ is / is Diane Hillmann"Diane Hillmann"
– The subject of the statement above is:The subject of the statement above is: http://dublincore.org/documents/usageguide/http://dublincore.org/documents/usageguide/
– The predicate is: editorThe predicate is: editor
– The object is: Diane HillmannThe object is: Diane Hillmann
1515Metadata Standards & ApplicationsMetadata Standards & Applications
RDF and OWLRDF and OWL
RDF does not have the language to specify RDF does not have the language to specify all relationshipsall relationships
Web Ontology Language (OWL) can Web Ontology Language (OWL) can specify richer relationships, such as specify richer relationships, such as equivalence, inverse, uniqueequivalence, inverse, unique
RDF and OWL may be used togetherRDF and OWL may be used together Resource Description Framework Schema Resource Description Framework Schema
(RDFS): a syntax for expressing (RDFS): a syntax for expressing relationships between elementsrelationships between elements
1616Metadata Standards & ApplicationsMetadata Standards & Applications
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/"> <rdf:Description rdf:about="http://www.dlib.org"> <dc:title>D-Lib Program - Research in Digital Libraries</dc:title> <dc:description>The D-Lib program supports the community of people with research interests in digital libraries and electronic publishing.</dc:description> <dc:publisher>Corporation For National Research Initiatives</dc:publisher> <dc:date>1995-01-07</dc:date> <dc:subject> <rdf:Bag> <rdf:li>Research; statistical methods</rdf:li> <rdf:li>Education, research, related topics</rdf:li> <rdf:li>Library use Studies</rdf:li> </rdf:Bag> </dc:subject> <dc:type>World Wide Web Home Page</dc:type> <dc:format>text/html</dc:format> <dc:language>en</dc:language> </rdf:Description></rdf:RDF>
Note Note unordereunordere
d list d list
An XML/RDF Example
1717Metadata Standards & ApplicationsMetadata Standards & Applications
Note Note rdf:aboutrdf:about
Overview of Container FormatsOverview of Container Formats A container format is used to package A container format is used to package
together all forms of metadata and digital together all forms of metadata and digital contentcontent
Use of a container is compatible with, and Use of a container is compatible with, and an implementation of, the OAIS an implementation of, the OAIS information package conceptinformation package concept
METS: packages metadata with objects or METS: packages metadata with objects or links to objects and defines structural links to objects and defines structural relationshipsrelationships
MPEG 21 DID: represents digital objectsMPEG 21 DID: represents digital objects
1818Metadata Standards & ApplicationsMetadata Standards & Applications
METSMETS
Metadata Encoding & Transmission StandardMetadata Encoding & Transmission Standard Developed by the Digital Library Federation, Developed by the Digital Library Federation,
maintained by the Library of Congressmaintained by the Library of Congress ““... an XML document format for encoding ... an XML document format for encoding
metadata necessary for both management of metadata necessary for both management of digital library objects within a repository and digital library objects within a repository and exchange of such objects between repositories exchange of such objects between repositories (or between repositories and their users).”(or between repositories and their users).”
METS is open source and developed by open METS is open source and developed by open discussiondiscussion
Cultural heritage community is the main Cultural heritage community is the main audienceaudience
1919Metadata Standards & ApplicationsMetadata Standards & Applications
METS UsageMETS Usage
To package metadata with digital To package metadata with digital object in XML syntaxobject in XML syntax
For retrieving, storing, preserving, For retrieving, storing, preserving, serving resourceserving resource
For interchange of digital objects with For interchange of digital objects with their metadatatheir metadata
As an information package in a digital As an information package in a digital repository (may be a unit of storage or repository (may be a unit of storage or a transmission format)a transmission format)
Metadata Standards & ApplicationsMetadata Standards & Applications 2020
METS SectionsMETS Sections
Defined in METS schema for navigation & Defined in METS schema for navigation & browsingbrowsing– 1. Header (XML Namespaces)1. Header (XML Namespaces)– 2. File inventory2. File inventory– 3. Structural Map & Links3. Structural Map & Links– 4. Descriptive Metadata (not part of METS but uses an 4. Descriptive Metadata (not part of METS but uses an
externally developed descriptive metadata standard, externally developed descriptive metadata standard, e.g. DC, MODS)e.g. DC, MODS)
– 5. Administrative Metadata (points to external schemas):5. Administrative Metadata (points to external schemas): 1. Technical, Source1. Technical, Source 2. Digital Provenance2. Digital Provenance 3. Rights3. Rights
Metadata Standards & ApplicationsMetadata Standards & Applications 2121
Metadata Standards & ApplicationsMetadata Standards & Applications 2222
METS Extension SchemasMETS Extension Schemas
““Wrappers” or “sockets” where elements Wrappers” or “sockets” where elements from other schemas can be plugged infrom other schemas can be plugged in– Uses the XML Schema facility for combining Uses the XML Schema facility for combining
vocabularies from different Namespacesvocabularies from different Namespaces Endorsed extension schemas:Endorsed extension schemas:
– Descriptive: DC, MODS, MARCXMLDescriptive: DC, MODS, MARCXML– Technical metadata: MIX (image); textMD Technical metadata: MIX (image); textMD
(text)(text)– Preservation related: PREMISPreservation related: PREMIS
Metadata Standards & ApplicationsMetadata Standards & Applications 2323
MPEG-21 Digital Item Declaration MPEG-21 Digital Item Declaration (DID)(DID)
ISO/IEC 21000-2: Digital Item DeclarationISO/IEC 21000-2: Digital Item Declaration– An alternative to represent Digital ObjectsAn alternative to represent Digital Objects– Supported by some repositories, e.g., aDORe, Supported by some repositories, e.g., aDORe,
DSpace, FedoraDSpace, Fedora Model that represents compound objects Model that represents compound objects
(recursive “item”)(recursive “item”) MPEG DID is an ISO standard and has industry MPEG DID is an ISO standard and has industry
support, but it is often implemented in a support, but it is often implemented in a proprietary environment and the standards proprietary environment and the standards development is closed (as is ISO in general)development is closed (as is ISO in general)
Metadata Standards & ApplicationsMetadata Standards & Applications 2424
MPEG 21 Abstract ModelMPEG 21 Abstract Model
Metadata Standards & ApplicationsMetadata Standards & Applications 2525
An ExerciseAn Exercise
Encode a simple resource in both Encode a simple resource in both DC and MARC using XMLDC and MARC using XML
Use the template forms providedUse the template forms provided
2626Metadata Standards & ApplicationsMetadata Standards & Applications