metadata for the web from discovery to description
DESCRIPTION
Metadata for the Web From Discovery to Description. CS 502 – 20020226 Carl Lagoze – Cornell University. Co-existing Cost/Functionality Levels. Greater Functionality & Cost. Dublin Core Qualifiers. From fuzzy buckets to more specific description Model of “graceful degradation” - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/1.jpg)
Cornell CS 502
Metadata for the WebFrom Discovery to Description
CS 502 – 20020226Carl Lagoze – Cornell University
![Page 2: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/2.jpg)
Cornell CS 502
Co-existing Cost/Functionality Levels
Gre
ate
r Fun
ction
ality
&
Cost
![Page 3: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/3.jpg)
Cornell CS 502
Dublin Core Qualifiers
• From fuzzy buckets to more specific description
• Model of “graceful degradation”– Support both simplicity and specificity– Intra-domain and inter-domain semantics
![Page 4: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/4.jpg)
Cornell CS 502
Resource has property
DC:CreatorDC:TitleDC:SubjectDC:Date...
X
implied subject
impliedverb
one of 15properties
property value(an appropriateliteral)
[optional qualifier]
[optional qualifier]
qualifiers(adjectives)
![Page 5: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/5.jpg)
Cornell CS 502
Varieties of qualifiers: Element Refinements
• Make the meaning of an element narrower or more specific.
• Narrowing implies an is a relationship – a "date created“ is a "date“– an "is part of relation“ is a "relation“
• If your software does not understand the qualifier, you can safely ignore it.
![Page 6: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/6.jpg)
Cornell CS 502
Varieties of Qualifiers: Value Encoding Schemes
• Says that the value is– a term from a controlled vocabulary (e.g., Library of
Congress Subject Headings)– a string formatted in a standard way (e.g., "2001-05-
02" means May 3, not February 5)
• Even if a scheme is not known by software, the value should be "appropriate" and usable for resource discovery.
![Page 7: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/7.jpg)
Cornell CS 502
Resource has Date "2000-06-13"Revised
ISO8601
Resource has Subject "Languages -- Grammar"LCSH
![Page 8: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/8.jpg)
Cornell CS 502
Dumb-Down Principle for Qualifiers
• The fifteen elements should be usable and understandable with or without the qualifiers
• Qualifiers refine meaning (but may be harder to understand)
• Nouns can stand on their own without adjectives
• If your software encounters an unfamiliar qualifier, look it up -- or just ignore it!
• "has a“ relations break the model– E.g., a creator has a hair color
![Page 9: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/9.jpg)
Cornell CS 502
Resource has Date "2000-06-13"Revised
ISO8601
Resource has Subject "Languages -- Grammar"LCSH
Test for “good““ qualifiers:cover and ask: -- Does the statement still make sense? -- Is it still correct?
![Page 10: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/10.jpg)
Cornell CS 502
Resource has subjectaudience
Resource has creatoraffiliation
“Incorrect” Qualification
“Cornell University”
“pre-schoolers”
![Page 11: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/11.jpg)
Cornell CS 502
Open questions in this model
• Are uncontrolled and unconstrained values really useful for discovery?
• Is it possible for an organization (DCMI) to control the evolution of a language?
• How can "simple discovery metadata" be combined with complex descriptions? Is there a notion of graceful degradation?
• Can DC serve as a lingua franca (mapping template) among more complex models
![Page 12: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/12.jpg)
Cornell CS 502
Models for Deploying Metadata
• Embedded in the resource– low deployment threshold– Limited flexibility, limited model
• Linked to from resource– Using xlink– Is there only one source of metadata?
• Independent resource referencing resource– Model of accessing the object through its surrogate
![Page 13: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/13.jpg)
Cornell CS 502
Syntax Alternatives:HTML
• Advantages:– Simple Mechanism – META tags embedded in content– Widely deployed tools and knowledge
• Disadvantages– Limited structural richness (won’t support
hierarchical,tree-structured data or entity distinctions).
![Page 14: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/14.jpg)
Cornell CS 502
Dublin Core in HTML
• http://www.dublincore.org/documents/2000/08/15/dcq-html/
• HTML constructs– <link> to establish pseudo-namespace– <meta> for metadata statements
• name attribute for DC element (DC.element.ER)
• content attribute for element value
• scheme attribute for encoding scheme or controlled vocabulary
• lang attribute for language of element value
![Page 15: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/15.jpg)
Cornell CS 502
Dublin Core in HTML example
<link rel="schema.DC" href="http://purl.org/dc/elements/1.1"> <meta name="DC.Title" content="Business Unusual”><meta name=“DC.Title” lang=“es” content=“negocio inusual”> <meta name="DC.Creator" content="Carl Lagoze"> <meta name="DC.Subject" content="bibliographic control web cataloging "> <meta name="DC.Date.Created" scheme="W3CDTF"
content="2000-10-23"> <meta name="DC.Format" content="text/html"> <meta name="DC.Identifier" content="http://lcweb.loc.gov/lagoze_paper.html">
![Page 16: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/16.jpg)
Cornell CS 502
Unqualified Dublin Core in XML
http://www.dublincore.org/documents/2000/11/dcmes-xml/
<?xml version="1.0"?>
<!DOCTYPE rdf:RDF SYSTEM "http://dublincore.org/2000/12/01-dcmes-xml-dtd.dtd">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description rdf:about="http://www.ilrt.bristol.ac.uk/people/cmdjb/">
<dc:title>Dave Beckett's Home Page</dc:title>
<dc:creator>Dave Beckett</dc:creator>
<dc:publisher>ILRT, University of Bristol</dc:publisher>
<dc:date>2000-06-06</dc:date>
</rdf:Description>
</rdf:RDF>
![Page 17: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/17.jpg)
Cornell CS 502
Example of Dublin Core Use
A map in the United States Library of Congress on-line American Memory Collection
![Page 18: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/18.jpg)
Cornell CS 502
Title
The name given to the resource
< META name = “DC.Title” content = “Novi Belgii Novæque Angliæ:nec non partis Virginiæ tabula multis in locis emendata ” lang = “la” >
![Page 19: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/19.jpg)
Cornell CS 502
Creator
An entity primarily responsible for making the content of the resource
< META name = “DC.Creator” content = “Nicolaum Visscher” >
![Page 20: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/20.jpg)
Cornell CS 502
Subject
The topic of the content of the resource
< META name = “DC.Subject” content = “Middle Atlantic States” scheme = “LCSH”>< META name = “DC.Subject” content = “Maps” scheme = “LCSH”>< META name = “DC.Subject” content = “Early works to 1800” scheme = “LCSH”>
![Page 21: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/21.jpg)
Cornell CS 502
Description
An account of the content of the description
< META name = “DC.Description.Abstract” content = “An historical map showing the coast of New Jersey as perceived in the seventeenth century”>
![Page 22: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/22.jpg)
Cornell CS 502
Publisher
An entity responsible for making the resource available
< META name = “DC.Publisher” content = “Library of Congress, United States”>
![Page 23: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/23.jpg)
Cornell CS 502
Contributor
An entity responsible for making contributions to the content of the resource.
< META name = “DC.Contributor” content = “Historic Urban Plans”>
![Page 24: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/24.jpg)
Cornell CS 502
Date
A date associated with an event in the lifecycle of the resource
< META name = “DC.Date.Created” content = “1996-04-17” scheme = “W3C-DTF” >
![Page 25: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/25.jpg)
Cornell CS 502
Type
The nature or genre of the content of the resource
< META name = “DC.Type” content = “image”
scheme = “DCMIType”>
![Page 26: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/26.jpg)
Cornell CS 502
Format
The physical or digital manifestation of the resource
< META name = “DC.Format.Medium” content = “image/gif” scheme = “IMT”>
< META name = “DC.Format.Extent” content = “556K”>
![Page 27: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/27.jpg)
Cornell CS 502
Identifier
An unambiguous reference to the resource in the current context
< META name = “DC.Identifier” content = “http://loc.gov/coll1/img456.jpg” scheme = “URI”>
![Page 28: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/28.jpg)
Cornell CS 502
Source
A reference to a resource from which the present resource is derived.
< META name = “DC.Source” content = “G3715 1685 .V5 1969 (LOC catalog #)” >
![Page 29: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/29.jpg)
Cornell CS 502
Language
Language of the intellectual content of the object
< META name = “DC.Language” content = “nl”
scheme = “ISO 639-2”>
![Page 30: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/30.jpg)
Cornell CS 502
Relation
A reference to a related resource
< META name = “DC.Relation.isPartOf” content = “http://lcweb2.loc.gov/ammem/
gmdhtml/dsxpimg.html” scheme = “URI”>
![Page 31: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/31.jpg)
Cornell CS 502
Coverage
The extent or scope of the content of the resource
< META name = “DC.Coverage.Spatial” content = “New Jersey” scheme = “TGN" >< META name = “DC.Coverage.Temporal” content = “1650” scheme = W3C-DTF”>
![Page 32: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/32.jpg)
Cornell CS 502
Rights
Information about rights in and over the resource
< META name = “DC.Rights” content = “http://www.loc.gov/ rights_statement.htm”>
![Page 33: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/33.jpg)
Cornell CS 502
Distributed ContentThe Metadata Challenge
• From fixed, contained physical artifacts to fluid, distributed digital objects
• Need for basis of trust and authenticity in network environment
• Decentralization and specialization of resource description and need for mapping formalisms
![Page 34: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/34.jpg)
Cornell CS 502
Multi-entity nature of object description
Photographer
Camera type Software
Computer artist
![Page 35: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/35.jpg)
Cornell CS 502
Understanding Metadata based on Query Capabilities
• Simple boolean tags?– Creator=“Tom Baker” and “Title” contains “Dublin
Core”
• Agent, time, place questions?– Who was responsible for what and when and where
![Page 36: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/36.jpg)
Cornell CS 502
Attribute/Value approaches to metadata…
Hamlet has a creator Shakespeare
subject implied verb metadata noun literal
Play
wrig
ht
metadata adjective
The playwright of Hamlet was Shakespeare
R1
“Shakespeare”
“Hamlet”
dc:creator.playwright
dc:title
![Page 37: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/37.jpg)
Cornell CS 502
…run into problems for richer descriptions…
Hamlet has a creator Stratford
birt
hpla
ce
The playwright of Hamlet was Shakespeare,who was born in Stratford
“Stratford”R1
“Shakespeare”dc:creator.playwright
dc:creator.birthplace
![Page 38: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/38.jpg)
Cornell CS 502
…because of their failure to model entity distinctions
R1
“Stratford”
creatorR2
name “Shakespeare”
birthplacetitle
“Hamlet”
![Page 39: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/39.jpg)
Cornell CS 502
Applying a Model-Centric Approach
• Formally define common entities and relationships underlying multiple metadata vocabularies
• Describe them (and their inter-relationships) in a simple logical model
• Provide the framework for extending these common semantics to domain and application-specific metadata vocabularies.
![Page 40: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/40.jpg)
Cornell CS 502
Events are key to understanding metadata relationships?
• Modeling implied events as first-class objects provides attachment points for common entities – e.g., agents, contexts (times & places), roles.
• Clarifying attachment points facilitates understanding and querying “who was responsible for what when”.
![Page 41: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/41.jpg)
Cornell CS 502
ABC/Harmony Event-aware metadata ontology• Recognizing inherent lifecycle aspects of
description (esp. of digital content)• Modeling incorporates time (events and
situations) as first-class objects– Supplies clear attachment points for agents, roles,
existential properties
• Resource description as a “story-telling” activity
![Page 42: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/42.jpg)
Cornell CS 502
Resource-centric Metadata
Title Anna Karenina
Author Leo Tolstoy
Illustrator Orest Vereisky
Translator Margaret Wettlin
Date Created 1877
Date Translated 1978
Description Adultery & Depression
Birthplace Moscow
Birthdate 1828
?
![Page 43: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/43.jpg)
Cornell CS 502
“translator”
“Margaret Wettlin”“Orest Vereisky”
“illustrator”
“Anna Karenina”
“Tragic adultery andthe search for meaningfullove”
“English”
“author”
“creation”
“1877”“1978”
“translation”
“Russian”
“Leo Tolstoy”"Moscow"
“1828”
![Page 44: Metadata for the Web From Discovery to Description](https://reader030.vdocuments.us/reader030/viewer/2022013011/56813acb550346895da2ea07/html5/thumbnails/44.jpg)
Cornell CS 502
Queries over complex descriptive graphs
• Ability to ask questions like “show me all the translations of War and Peace between 1980 and 1990”