2009.09.29 chris poppe - metadata

41
ELIS – Multimedia Lab Metadata - Aanknopingspunten, Prioriteiten, Toekomsperspectieven en Aantekeningen vanuit de Marge Chris Poppe Multimedia Lab Department of Electronics and Information Systems Faculty of Engineering Ghent University

Upload: chris-poppe

Post on 06-May-2015

601 views

Category:

Technology


1 download

DESCRIPTION

Chris Poppe presents current and future metadata trends at a cultural heritage workshop.

TRANSCRIPT

Page 1: 2009.09.29   chris poppe - metadata

ELIS – Multimedia Lab

Metadata - Aanknopingspunten, Prioriteiten, Toekomsperspectieven

en Aantekeningen vanuit de Marge 

Chris PoppeMultimedia Lab

Department of Electronics and Information SystemsFaculty of Engineering

Ghent University

Page 2: 2009.09.29   chris poppe - metadata

2/39

ELIS – Multimedia Lab

Multimedia Lab

• Multimedia Lab– Research group of Ghent University (Faculty of

Engineering)– Multimedia

• Video!– Coding,– Processing– Transmission– Analysis– Adaptation– Annotation– …

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 3: 2009.09.29   chris poppe - metadata

3/39

ELIS – Multimedia Lab

Outline

• What is metadata?• Metadata vs. Tags?

– Benefits/disadvantages?• What is a metadata standard?

– Why is it needed?– How does it look like?– What are the problems?

• What is the semantic web?– Web 2.0?– Web 3.0?– Semantic Web Technologies?

• Conclusions

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 4: 2009.09.29   chris poppe - metadata

4/39

ELIS – Multimedia Lab

Metadata

• Data describing data• Museum for the history of sciences

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 5: 2009.09.29   chris poppe - metadata

5/39

ELIS – Multimedia Lab

Metadata

• Data describing data• Digital content

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

ResolutionDpiDate/Time createdCreatorCamera usedFile format (JPG, BMP, GIF, PNG, …)Location shot (GPS)CopyrightTitleGenreRatingCommentKeywordsDepicted event…

Page 6: 2009.09.29   chris poppe - metadata

6/39

ELIS – Multimedia Lab

Use of Metadata

• Understanding of multimedia content• Sharing• Management• Retrieval

– Search– browse

• Processing

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 7: 2009.09.29   chris poppe - metadata

7/39

ELIS – Multimedia Lab

Metadata: tags

• Tag– Free text annotation– Keywords, terms, comments– Informally– Personally– Started as taxonomies or vocabularies used to describe

content– Evolved into folksonomies

• User-driven

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 8: 2009.09.29   chris poppe - metadata

8/39

ELIS – Multimedia Lab

Taxonomies

• Top down• Pre-defined structure• Hierarchy• Controlled vocabularies• Expert

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 9: 2009.09.29   chris poppe - metadata

9/39

ELIS – Multimedia Lab

Taxonomies

• Example– Dewey Decimal Classification– Library classification

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 10: 2009.09.29   chris poppe - metadata

10/39

ELIS – Multimedia Lab

Folksonomy

• Folk + taxonomy– Free form text annotation– No predefined structure– No hierarchy– Users add metadata– Flat name space– Bottom up

• Two types:– Broad– Narrow

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 11: 2009.09.29   chris poppe - metadata

11/39

ELIS – Multimedia Lab

Broad Folksonomy

• Tagging shared content• Anyone can participate• Examples

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 12: 2009.09.29   chris poppe - metadata

12/39

ELIS – Multimedia Lab

Narrow Folksonomy

• Tagging your own content• Tagging friend’s content

– No consolidation– No emerging vocabularies

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 13: 2009.09.29   chris poppe - metadata

13/39

ELIS – Multimedia Lab

Tagging usage

• Navigation– Tag clouds– Organization– Hints

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 14: 2009.09.29   chris poppe - metadata

14/39

ELIS – Multimedia Lab

Tagging howto?

• Totally free• Semi-structured• Hinted

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 15: 2009.09.29   chris poppe - metadata

15/39

ELIS – Multimedia Lab

Tagging problems

• Cultural differences: Genghis Kahn, for some a hero, for others a criminal

• Communities of users can give different meaning to tags: Movie vs. Film vs. Cinema

• Language issues• Ambiguity• Misspelled tags (40% Flickr, 28% del.icio.us)• Semantics of tags

– Factual tags: what is it about, what it is: ‘image’, ‘article’, ‘blog’,…

– Subjective tags: user’s opinion: ‘funny’, ‘hot’, ‘stupid’,…– Personal tags: self reference: ‘toread’, ‘mycomments’,

…– Tag: “nothing to do with Brussels”

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 16: 2009.09.29   chris poppe - metadata

16/39

ELIS – Multimedia Lab

Metadata

• Data describing data• Digital content

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

ResolutionDpiDate/Time createdCreatorCamera usedFile format (JPG, BMP, GIF, PNG, …)Location shot (GPS)CopyrightTitleGenreRatingCommentKeywordsDepicted event…

Page 17: 2009.09.29   chris poppe - metadata

17/39

ELIS – Multimedia Lab

MP2 JPEG MPEG-2 MXF JPEG2000 AVI AAC H.264/MPEG-4 AVC PNG

Motion JPEG2000 TIFF MP4 MPEG WAV FLAC VC-1 Ogg Vorbis DivX AIFF GIF JPEG-LS Matroska OGM/OGG Windows Media Audio DIRAC 3GP DV FLV Betacam Realmedia MOV AC-3/Dolby Digital Theora ASF TTA

• Compression and container formats

• Standards for multimedia– Standards for metadata?

Multimedia

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Video compressionAudio compressionImage compression Physical Containers

Page 18: 2009.09.29   chris poppe - metadata

18/39

ELIS – Multimedia Lab

• Standard which determines the structure of metadata

Metadata Standard

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

ResolutionDpi

Date/Time createdCreator

Camera usedFile format (JPG, BMP, GIF, PNG, …)

Location shot (GPS)Copyright

TitleGenreRating

CommentKeywords

Depicted event…

<?xml version=“1.0” encoding=“UTF-8” ?><mods xmlns=http://www.loc.gov/mods/…<titleInfo> <title>De geruchten</title></titleInfo><name type=“personal”> <namePart>Claus, Hugo</namePart> <namePart type=“date”>1929-</namePart> <role> <text>creator</text> </role></name><typeOfResource>text</typeOfResource><originInfo>… </originInfo>...</mods>

MODSMetadata Object Description Schema

Page 19: 2009.09.29   chris poppe - metadata

19/39

ELIS – Multimedia Lab

XML

• XML (Extensible Markup Language)– Standardized by W3C (World Wide Web Consortium)– Language to define the structure of a document

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

<? xml version="1.0" encoding="UTF-8" ?><!-- Dit is een boekenlijst. -->

<boekenlijst> <boek categorie="thriller"> <titel>Het Bernini Mysterie</titel> <auteur>Dan Brown</auteur> </boek> <boek categorie="woordenboek"> <titel>Van Dale Frans-Nederlands</titel> <auteur /> </boek></boekenlijst>

•XML element•Attribute•values

Page 20: 2009.09.29   chris poppe - metadata

20/39

ELIS – Multimedia Lab

XML Schema

• XML Schema– Uses XML to denote the structure of a document

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

<? xml version="1.0" encoding="UTF-8" ?><!-- Dit is een boekenlijst. -->

<boekenlijst> <boek categorie="thriller"> <titel>Het Bernini Mysterie</titel> <auteur>Dan Brown</auteur> </boek> <boek categorie="woordenboek"> <titel>Van Dale Frans-Nederlands</titel> <auteur /> </boek></boekenlijst>

•XML schema•Elements:

•Boekenlijst•Boek•Titel•Auteur

•Order•Types (of values)

Determines

Page 21: 2009.09.29   chris poppe - metadata

21/39

ELIS – Multimedia Lab

Metadata Standard

• Describe structure of metadata using XML schema

• Textual specification, explains semantics of the elements– titleInfo : “A word, phrase, character, or group of

characters, normally appearing in a resource, that names it or the work contained in it. “

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

<?xml version=“1.0” encoding=“UTF-8” ?><mods xmlns=http://www.loc.gov/mods/…<titleInfo> <title>De geruchten</title></titleInfo><name type=“personal”>…</name><typeOfResource>text</typeOfResource><originInfo>… </originInfo>...</mods>

MODS XML schema

Determines

Page 22: 2009.09.29   chris poppe - metadata

22/39

ELIS – Multimedia Lab

• Shared information uses common structure• Standard software can be used to parse information

Use of Metadata Standards

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

<?xml version=“1.0” encoding=“UTF-8” ?><mods xmlns=http://www.loc.gov/mods/…<titleInfo> <title>De geruchten</title></titleInfo><name type=“personal”>…</name><typeOfResource>text</typeOfResource><originInfo>… </originInfo>...</mods>

MODS document

DB

<?xml version=“1.0” encoding=“UTF-8” ?><mods xmlns=http://www.loc.gov/mods/…<titleInfo> <title>De geruchten</title></titleInfo><name type=“personal”>…</name><typeOfResource>text</typeOfResource><originInfo>… </originInfo>...</mods>

MODS document<?xml version=“1.0” encoding=“UTF-8” ?><mods xmlns=http://www.loc.gov/mods/…<titleInfo> <title>De geruchten</title></titleInfo><name type=“personal”>…</name><typeOfResource>text</typeOfResource><originInfo>… </originInfo>...</mods>

MODS document

DBSpeak same language

Page 23: 2009.09.29   chris poppe - metadata

23/39

ELIS – Multimedia Lab

Metadata Standards

• Different Metadata Standards exist!– Different domains– Different communities– Different formats– Different focus

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 24: 2009.09.29   chris poppe - metadata

24/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Problem Metadata Standards

• Different Metadata standards can describe same thing• But in different way!!!

<object id=“0”> <box xc=“77” yc=“73” w=“21” h=“16”/></object>

Box: “Coordinates of the centre and the dimensions of the bounding box of a detected object in pixels.”

metadata example 1

CVML (Computer Vision Markup Language)

<LLID =“LLID1”><Mask> <BB mp7:dim=“4”>67 65 88 91</BB></Mask> </LLID>

BB: “Coordinates of a rectangular segment.”

metadata example 2

VS7 (Video Surveillance Schema)

Page 25: 2009.09.29   chris poppe - metadata

25/39

ELIS – Multimedia Lab

Problems Metadata Standard

• Current metadata standards define structure of metadata• Mappings are needed to use different standards within one

system• Metadata standard does not solve everything!

– For instance: DC creator property• Creator=“Shakespeare, William”• Creator=“William Shakespeare”• Creator=“Shakespeare”• Creator=“W. Shakespare”

– Same problems as tagging can occur• Lack of ways to describe semantics of metadata

– Currently plain text– Not machine readable

• Multimedia content shifts to online repositoriesMetadata

Chris PoppeLevend Geheugen, Brussels, Belgium – September 29 2009

Page 26: 2009.09.29   chris poppe - metadata

26/39

ELIS – Multimedia Lab

Semantic Web ?.0

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 27: 2009.09.29   chris poppe - metadata

27/39

ELIS – Multimedia Lab

The Syntactic Web

• Consider a typical web page:

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

• Mark-up consists of: – rendering

information (e.g., font size and colour)

– Hyper-links to related content

• Semantic content is accessible to humans but not (easily) to computers…

Page 28: 2009.09.29   chris poppe - metadata

28/39

ELIS – Multimedia Lab

Impossible (?) using the Syntactic Web…

• Complex queries involving background knowledge– Give me the telephone number of the responsible

person within Multimedia Lab of the demo about metadata applications

• Locating information in data repositories– Travel enquiries– Prices of goods and services– Results of human genome experiments

• Finding and using “web services”– Visualize surface interactions between two proteins

• Delegating complex tasks to web “agents”– Book me a holiday next weekend somewhere warm, not

too far away, and where they speak French or English

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 29: 2009.09.29   chris poppe - metadata

29/39

ELIS – Multimedia Lab

Semantic Web Technologies

• Technologies developed by the World Wide Web Consortium (W3C)

• Vision: the Web as universal medium for data, information and knowledge exchange

• HTML, XML -> RDF, RDFS, OWL, …• Technologies to interconnect, exchange information

– Applicable for metadata also!

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 30: 2009.09.29   chris poppe - metadata

30/39

ELIS – Multimedia Lab

Why is XML not enough

• http://www.w3.org/DesignIssues/RDF-XML.html (Tim Berners-lee)

• Try to express “The author of the note is Tim” in XML

• For a person, the three representations means the same, but NOT for a machine!– XML contains structures only, no semantics

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

<author> <uri>note</uri> <name>Tim</name> </author>

<author> <uri>note</uri> <name>Tim</name> </author>

<document href="note"> <author>Tim</author> </document>

<document href="note"> <author>Tim</author> </document>

<document uri="note" author="Tim" /><document uri="note" author="Tim" />

Page 31: 2009.09.29   chris poppe - metadata

31/39

ELIS – Multimedia Lab

RDF

• RDF (Resource Description Framework)• Triples: subject – predicate – object• URI to identify resources• “The author of the note is Tim”

• Serialization in XML:• <rdf:RDF xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#>

<Note rdf:about=http://www.example.org/#note> <hasAuthor rdf:resource="http://www.example.org/#Tim”/> </Note> </rdf:RDF>

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Note TimhasAuthor

Page 32: 2009.09.29   chris poppe - metadata

32/39

ELIS – Multimedia Lab

RDFS

• RDF Schema• Standardized vocabulary for describing concepts• Introduces classes and instances

• Subclasses, sub properties– Possible to define hierarchies!

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Note1

TimhasAuthor

ClassNote

ClassPerso

n

type type

Page 33: 2009.09.29   chris poppe - metadata

33/39

ELIS – Multimedia Lab

OWL

• Web Ontology Language, W3C recommendation (2004)• Provides richer vocabulary• Define advanced relations

– Data typing– Cardinalities– Rich typing of properties– …

• Example:

• Allows for intelligent reasoning• Complex ontologies can be created

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Note1

TimhasAuthor

ClassNote

ClassPerso

n

type type

isAuthorFrom

<owl:ObjectProperty rdf:ID=“isAuthorFrom”> <owl:inverseOf rdf:resource=“#hasAuthor”></owl:ObjectProperty>

Page 34: 2009.09.29   chris poppe - metadata

34/39

ELIS – Multimedia Lab

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Ontology

• Information in a domain is structured using an ontology• a data model that represents a set of concepts and relations

amongst these concepts within a specific domain

• Thesaurus– Dictionary

• Synonyms

• Taxonomy– Hierarchy

• Subclass and siblings

• Ontology– concepts– relations

Page 35: 2009.09.29   chris poppe - metadata

35/39

ELIS – Multimedia Lab

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Ontology (using OWL)

• Example: ontology for domain of science

subClassOf

birth date

DatatypeProperty

PersonClass: Person

Class: ScientistScientist

Individualbirth date

“14/10/1801”

OWL constructs• Class• DatatypeProperty• subClassOf• Individual• …

“Joseph Plateau”

Page 36: 2009.09.29   chris poppe - metadata

36/39

ELIS – Multimedia Lab

Semantic Web Technologies

• SPARQL Protocol And RDF Query Language (SPARQL)– SQL-like language for RDF– Example: search for all the notes of Tim

• SELECT ?x WHERE ?x hasAuthor Tim

• Rule Interchange Language (RIF)– Example rule: if Tim is the author of the note, he is also

a contributor– goal is to create an interchange format for different rule

languages and inference engines – closely related to ontologies

• rules combine information and derive new information on top of ontologies

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 37: 2009.09.29   chris poppe - metadata

37/39

ELIS – Multimedia Lab

Semantic Web Technologies

• Data on the web can be linked to each other– Example: ontology on Brussels

• DBpedia.org

– Browsing:• Brussels ->cityofbirth -> Raymon_Goethals ->

managerclubs -> RSC Anderlecht …

– Querying: find all people born in Brussels before 1930– Reasoning: if a person was born in Brussels, he was

also born in Belgium

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

About Brussels.mht

Page 38: 2009.09.29   chris poppe - metadata

38/39

ELIS – Multimedia Lab

Semantic Web Technologies

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 39: 2009.09.29   chris poppe - metadata

39/39

ELIS – Multimedia Lab

Semantic Web ?.0

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 40: 2009.09.29   chris poppe - metadata

40/39

ELIS – Multimedia Lab

Conclusions

• Use metadata standards!– Allows interchange– Structures the metadata

• When no standard is sufficient– Apply proprietary format– Structures the metadata

• If tagging is needed for search/browsing/retrieval– Provide fixed structure

• E.g., who, what, where, when, …

– Provide fixed vocabulary• Thesaurus• Hierarchy• Ontology for advanced reasoning

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 41: 2009.09.29   chris poppe - metadata

41/39

ELIS – Multimedia Lab

Questions?

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009