e-culture semantic search pilot
DESCRIPTION
Seminar, Staford Medical Informatics, August 2006TRANSCRIPT
MultimediaNPilot E-Culture
2
Pilot E-Culture
Partners: VU, UvA, CWI, DEN, ICN
Subproject of MultimediaN, a 16 MEuro project on multimedia technology funded by the Dutch government
Aim: demonstrate added value of Semantic Web techniques for virtual heritage collections
3
4
Hypothesis
Semantic Web technology is in particular useful in knowledge-rich domains
or formulated differently
If we cannot show added value in knowledge-rich domains, then it may have no value at all
5
Use case: painting style
Find paintings of a similar style
KLIMT, GustavPortrait of Adele Bloch-Bauer I1907Oil and gold on canvas138 x 138 cmAustrian Gallery, Vienna
6
How can we find this other ‘Art nouveau’ painting?
MUNCH, EdvardThe Scream1893Oil, tempera and pastel on
cardboard91 x 73.5 cmNational Gallery, Oslo
7
Issues w.r.t. the use case
Parse annotation to find matches with thesauri terms– E.g. match artists to ULAN individuals
Artists-style links– AAT contains styles; ULAN contains artists, but there
is no link• Learn link from corpora• Derive it from other annotations
– Domain-specific rules/reasoning needed • see example in SWRL doc• Painters may have painted in multiple styles
8
Natural-lang proc.automatic annotation
text stings concepts
Distributedcultuurwijzer.nl collections
OAI-based access
Reasoning supporttime/space reasoning
Web interfacesupport for web collections
Presentation facilitiessemantic presentation
device-specific
InteroperabilityXML/RDF/OWL
Scalability> 10,000,000 triples
OntologiesWordNet, AAT, TGN ULAN, Dutch labels
Search strategiessibling searchsemantic distance
Dublin Corespecializationsdumb-down
semantic annotation
DIGITAL HERITAGE COLLECTIONS
semantic search
BASELINEENHANCEDENHANCEDFEATURESFEATURES
NEWNEWFEATURESFEATURES
9
Architecture
10
Use of thesauri
RDF/OWL data models of Getty thesauri– Issues: scope, preserving structure
WordNet: W3C SWBPD workhttp://www.w3.org/TR/wordnet-rdf/
Multilingualism– Dutch version of AAT
Existing collection metadata are parsed to find matches in thesauri (e.g. creator name => ULAN entry)
11
Distributed vs. centralized collection dataMinimal requirement: collection object has
image URIPreference for external metadata,
accessed through protocol such as OAI In practice, external metadata access is
still cumbersome
12
Search strategies
Basic search: keyword-orientedAdvanced search:
– Tweaking default search parameters– Time-related queries
Faceted searchRelation search
– How are two URIs related?
13
Keyword search with semantic clustering1. Btree of literals plus Porter stem and
metaphone index2. Find resources with matching labels
• Default resources are “Work”s
3. Find related resources by one-way graph traversal
• owl:inverseOf is used• Threshold used for constraining search
4. Cluster results (group instances)
14
Demonstrator
15
Search: WordNet patterns that increase recall without sacrificing precisions
(Hollink)
16
Triple statistics
17
Status
4-year project, now in month 18Short-term goals:
– Adding more ethnological collections– Location-oriented presentation– User studies with professional users (museum
people) and interested lay persons– Multi-lingual interface (English, Dutch,
Indonesian)
18
Issues
Getting access to collections is mainly a social process– There is usually no principled objection to make data,
metadata and thesauri publicly available, but it still feels threatening
Cultural heritage is a good area for a Semantic Web “island”:– lots of domain-specific knowledge– strong application pull– enormous amount of existing annotations, which have
been built up over centuries
19
On-line demohttp://e-culture.multimedian.nl