owl, ontologies & textguus/talks/06-otm.pdf · kersen, marco de niet, borys omelayenko, jacco...

42
OWL, Ontologies & Text Challenges from the cultural-heritage domain Guus Schreiber Free University Amsterdam

Upload: others

Post on 07-Jan-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

OWL, Ontologies & Text

Challenges from the cultural-heritage domain

Guus SchreiberFree University Amsterdam

Page 2: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

2

Overview

Ontologies in general (brief)W3C work on ontologies and ontology engineering for the Semantic Web (brief)Use cases involving ontologies & text (& other media)– Based on cultural-heritage projects we are involved in

Page 3: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

3

Acknowledgements

MultimediaN E-Culture Project: – Alia Amin, Mark van Assem, Victor de Boer, Lynda Hardman,

Michiel Hildebrand, Laura Hollink, Zhisheng Huang, Janneke van Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga

CHOICE Project @ Sound & Vision– Hennie Brugman, Luit Gazendam, Veronique Malaise, Johan

Oomen, Mettina VeenstraMuNCH project @ Sound & Vision– Laura Hollink, Bouke Hunning, Michiel van Liempt, Johan

Oomenm Maarten de Rijke, Arnold SmeuldersCees Snoek, Marcel Worring,

Page 4: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

4

Semantics for the Web:some challenges

Machine-processable representation of semantic informationDefining semantics in an OPEN environment– Adding semantics to other people’s semantics – Ability for everyone to contribute

Ability to define mappings between semantic representations– There is no uniform way to classify the world!

Page 5: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

5

The notion of ontology (as currently used in computer science)

The Semantic Web needs sets of shared conceptsThese sets of concepts are called “ontologies”It is hard and time-consuming to develop ontologiesTherefore, the Semantic Web developers are looking for existing ontologies, vocabularies, taxonomies

Page 6: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

6

Ontologies and data models

Main difference with data models is not the content, but the purpose (generalizes over applications)You cannot see the difference by just looking at the syntax!A conceptual model written in a ontology language is not necessarily an ontology!

Page 7: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

7

Example “ontologies” for SW applications

Domain-specific vocabularies– Medicine: UMLS, SNOMED, Galen– Art history: AAT, ULAN– Geography: TGN

Generic ontologies – Top-level categories (reminiscent of Aristotelian

categories)– Lexical vocabularies: WordNet– Units and dimensions, time ontology– Currencies, country codes, …

Page 8: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

8

Good and bad ontologies?!

Good ontologies are usedGood ontologies represent some form of consensus in a communityGood ontologies are maintainedGood ontologies do not need to be complexGood ontologies may contain “mistakes”

Page 9: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

9

RDF/OWL language constructs

classes and individualssubclassespropertiessubpropertiesdomain/range of propertiesXML Schema datatypes

equality, inequality inverse, transitive, symmetric, functional propertiesproperty constraints: cardinality, allValuesFrom, someValuesFromconjunction, disjunction, negation of classeshasValue, enumerated type

Page 10: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

10

RDF/OWL family of languages

OWL Full is a vocabulary extension of RDF.The RDF restrictions in OWL DL are there for good technical reasonsTime will have to prove whether there is a place for OWL Lite or some other OWL subset.RDF/OWL: one can view it as an historical artefact that these are not grouped under the same acronym.

Page 11: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

11

Is RDF/OWL just another datamodelling/KR language?

Key differences:– All classes/properties/individuals have a URI as

identifier– RDF/XML exchange syntax enables interoperabilityXML features – UTF-8 character set– Support for multilinguality– Use of XML Schema datatypes: numeric, date, time,

etc.For the rest: RDF/OWL is state-of-the-art concept

language

Page 12: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

12

Semantic Web Best Practices and Deployment Working Group

Objective: support for semantic-web application developerFocus on “low hanging fruit”Publishing key ontologies/vocabulariesDevelopment guidelines, ontology-design patterns, repositories, links to related techniques, ……

Page 13: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

13

Ontology engineering patterns

Best practices for frequently occurring modeling problemsWG documents outline alternatives with pros and consNotes:– Classes as values– N-ary relations– Specification of value sets– Part-of

Page 14: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

14

Metamodelling

OWL DL requires strict separation of classes and instancesBut on the Semantic Web my instances may be your classes!Metamodelling features especially required in vocabulary/ontology mapping and/or interpretationCf. Protégé metamodelling facilitiesOWL 1.1 (not standardized) allows limited metamodelling within OWL DL scope

Page 15: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

15

Example: WordNet

Class(LexicalConcept)Class(Noun subClassOf(LexicalConcept))Property(hyponymOf

domain(LexicalConcept) range(LexicalConcept))

Individual(1000768 type(LexicalConcept)wordForm(Human))

Problem: how to use the hyponym hierarchy as a subclass hierarchy?

Page 16: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

16

RDF solution: use metamodelling

subClassOf(LexicalConcept Class)subPropertyOf(hyponymOf subClassOf)subPropertyOf(wordForm rdfs:label)

Corresponds to our intuition that WordNet model is a metamodel

Page 17: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

17

Thesauri and ontologies

Semantic Web Challenge showed that thesauri are important resources for SW applicationsTypically weak semantic structureApproach in w3c Semantic Web Best Practices WG:– Phase 1: “as-is” conversion– Phase 2: additional ontological

interpretations/extensions

Page 18: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

18

New W3C work: Semantic Web Deployment Working Group

Mission to help in vocabulary deploymentChartered to standardize SKOSPattern for RDF/OWL representation of (ISO-compliant)

thesauriGuidelines for adding semantics to existing vocabularies

Page 19: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

MultimediaNPilot E-Culture

Page 20: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

20

Hypothesis

Semantic Web technology is in particular useful in knowledge-rich domains

or formulated differently

If we cannot show added value in knowledge-rich domains, then it may have no value at all

Page 21: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

21

Natural-lang proc.automatic annotation

text stings → concepts

Distributedcultuurwijzer.nl collections

OAI-based access

Reasoning supporttime/space reasoning

Web interfacesupport for web collections

Presentation facilitiessemantic presentation

device-specific

InteroperabilityXML/RDF/OWL

Scalability> 10,000,000 triples

OntologiesWordNet, AAT, TGN ULAN, Dutch labels

Search strategiessibling searchsemantic distance

Dublin Corespecializationsdumb-down

semantic annotationDIGITAL HERITAGE

COLLECTIONSsemantic search

BASELINEENHANCEDENHANCEDFEATURESFEATURES

NEWNEWFEATURESFEATURES

Page 22: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

22

Use of thesauri

RDF/OWL data models of Getty thesauri– Issues: scope, preserving structure

WordNet: W3C SWBPD workhttp://www.w3.org/TR/wordnet-rdf/

Multilingualism– Dutch version of AAT

Existing collection metadata are parsed to find matches in thesauri (e.g. creator name => ULAN entry)

Page 23: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

23

Page 24: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

24

WordNet synsets, senses and words have URIs

Page 25: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

25

On-line demohttp://e-culture.multimedian.nl

Page 26: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

26

Use case: Asian chairs

User has found an image of an Asian chair

Annotation:ex:image vra:stylePeriod aat:Guangxu .

How can we find images of Asian chairs from the same historical period?

Page 27: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

27

AAT info on Guangxu

Page 28: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

28

Page 29: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

29

Observations

Many queries require time/space knowledge, either absolute or abstractedFor the chair image we can establish– Country = China (link Chinese => China)– Period = 1644-1911 (from Qing description)

Technology requirements:– Thesauri relating time/space concepts– NLP for unstructured descriptions– Time/space reasoning techniques

Page 30: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

30

Use case: existing annotations

MATISSE, HenriLe bonheur de vivre (The Joy of Life)1905-1906Oil on canvas, 69 1/8 x 94 7/8 in. (175 x 241 cm)Barnes Foundation, Merion, PA

Page 31: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

31

Textual annotation mapped to thesauri terms

Page 32: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

32

Use case: how can we find this other Fauve painting?

DERAIN, AndreThe Turning Road, L'Estaque, 1906Oil on canvas, 51 x 76 3/4 in. (129.5 x 195 cm)Museum of Fine Arts, Houston, Texas

Page 33: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

33

Issues w.r.t. the use case

Parse annotation to find matches with thesauri terms– E.g. match artists to ULAN individuals

Artists-style links– AAT contains styles; ULAN contains artists, but there

is no link• Learn link from corpora• Derive it from other annotations

– Domain-specific rules/reasoning needed • see example in SWRL doc• Painters may have painted in multiple styles

Page 34: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

34

Use case: extracting additional knowledge from scope notes

Page 35: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

35

Use case: semantics for query expansion (Hollink)

Page 36: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

36

Issues

Many thesauri do not have a rich semantic structure like WordNetNeed for learning additional semantic relations between thesaurus conceptsResult: “ontologizing thesauri”NLP is crucial technique

Page 37: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

37

Use case: supporting annotation of broadcasts

Current situation: mainly manualNot feasible for large-scale digital archivingContext documents for programs can be identifiedCan we generate candidate annotation?Example from CHOICE project

Page 38: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

38

Issues

Broadcast archives have their own annotation template– Typically specialization of Dublin Core

In-house thesaurus is usually available, but may be of limited use– Consider including other (public) thesauri

Multi-linguality is prominent issue

Our experience: key role for user studies– Dramatic changes of the existing business process of

the archive

Page 39: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

39

Use case: concept detectors in video (Snoek et al)

Page 40: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

40

Challenges

Extremely tough problemExample data: TRECVID 2005Approach: combine content-based image retrieval with NLP and ontologiesIssue (among many others): context-specificity of TRECVID thesaurus

LSCOM lexicon: 229 - Weather

Page 41: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

41

LSCOM lexicon: 110 – Female Anchor

Composite conceptAlignment needed for semantic search, e.g. with WordNet

Page 42: OWL, Ontologies & Textguus/talks/06-otm.pdf · Kersen, Marco de Niet, Borys Omelayenko, Jacco van Ossenbruggen, Ronny Siebes, Jos Taekema, Jan Wielemaker, Bob Wielinga ... – Period

42

Main observation of this talk

A combination of many different techniques is needed to be able to cope with the complexity of multimedia semantics– NLP, segmentation, CBIR, visual feature detectors,

visual ontologies, publicly available thesauri, thesauri mappings, dedicated reasoning techniques (time, space, default), personalization, presentation generation

Multi-disciplinary approach is a must– And methods that combine text and ontologies are

key (but not only) element of such an approach