first insights into the library track of the oaei dominique ritze mannheim university library
TRANSCRIPT
First Insights into the Library
Track of the OAEI
Dominique RitzeMannheim University Library
Motivation
Publication x
subject (thesaurus 2): ontology alignment
Ontology Mapping
Search 0 results
Thesaurus 1 Thesaurus 2
Ontology Mapping Ontology Alignment
Ontology Mapping
SearchPublication x
subject (thesaurus 1): ontology alignment
=
Overview
• Ontology Matching
• OAEI
• Thesaurus vs. Ontology
• OAEI Library Track 2012
• Lessons learned and Future Work
Ontology Matching
< Author, Author, =, 0.97 >< Paper, Paper, =, 0.94 >< reviews, reviews, =, 0.91 >< writes, writes, =, 0.7 >< Person, People, =, 0.8 >< Document, Doc, =, 0.7 >< Reviewer, Review, =, 0.6 >…
Ontology Matching Evaluation
Tool
O1
R
AO2
m
Test
Result
Ontology Alignment Evaluation Initiative (OAEI)• Annual campaign started 2005 • Different tracks/datasets• Benchmark, Anatomy, Conference, Multifarm,
Large BioMed, Library, Instance Matching• 21 submitted systems (2012)• Goal: Improving the performances of the ontology
matching field• Through comparison of algorithms• New challenges for the systems
Thesaurus = Ontology?SKOS OWL
skos:concept owl:class
skos:prefLabelskos:alternativeLabel
rdfs:label
skos:scopeNoteskos:notation
rdfs:comment
A skos:narrower B A rdfs: subClassOf B
A skos:broader B B rdfs:subClassOf A
skos:related rdfs:seeAlso
Commodities
Germany
Ananas
Tropical Fruit
Metal Product -> Metal
OAEI Library Track
Are current state-of-the-art ontology matching tools able to match thesauri?
Dominique Ritze, Kai Eckert, Benjamin Zapilko, Joachim Neubert
Data Set
• Thesaurus for economics (STW)• 6.000 concepts with 19.000 additional
keywords (EN, DE)• Thesaurus for the Sociel Sciences (TheSoz)• 8.000 concepts with 4.000 additional
keywords (EN, DE, FR)
• Reference alignment manually created in 2006• Both actively used in libraries for keyword indexing
Execution
• 7GB Debian machine• Timeframe 1 week• 13 of the 21 submitted systems were able to
generate an alignment• No system had a heap space problem• Evaluation: Precision, Recall, F-Measure, Runtime
Results
How to evaluate the results? F-Measure of 0.67 good?
System Precision Recall F-Measure Time (s) Size 1:1GOMMA 0.537 0.906 0.674 804 4712
ServOMapLt 0.654 0.687 0.670 45 2938LogMap 0.688 0.644 0.665 95 2620
ServOMap 0.717 0.619 0.665 44 2413 yesYAM++ 0.595 0.750 0.664 496 3522
LogMapLt 0.577 0.776 0.662 21 3756G02A 0.675 0.645 0.660 32773 2671
Hertuda 0.465 0.925 0.619 14363 5559WeSeE 0.612 0.607 0.609 144070 2774 yes
HotMatch 0.645 0.575 0.608 14494 2494 yesCODI 0.434 0.481 0.456 39869 3100 yes
MapSSS 0.520 0.184 0.272 2171 989 yesAROMA 0.107 0.652 0.184 1096 17001Optima 0.321 0.072 0.117 37457 624
Results
System Precision Recall F-Measure Time (s) Size 1:1MatcherPref 0.820 0.642 0.720 75 2190MatcherDE 0.891 0.601 0.717 42 1885MatcherAll 0.544 0.896 0.677 735 4605
GOMMA 0.537 0.906 0.674 804 4712ServOMapLt 0.654 0.687 0.670 45 2938
LogMap 0.688 0.644 0.665 95 2620…
MatcherEN 0.808 0.439 0.569 36 1518CODI 0.434 0.481 0.456 39869 3100 yes
MapSSS 0.520 0.184 0.272 2171 989 yesAROMA 0.107 0.652 0.184 1096 17001Optima 0.321 0.072 0.117 37457 624
Manual Evaluation
• Between 38 and 269 new correct correspondences found per matcher
• Up to half of the correspondences correct • Many new correspondences are quite simple• Some more “complex” and interesting ones• Automated production = CAM
• Several incorrect ones if the labels are quite similar• Difficult to distinguish the names of countries,
their inhabitants and the languages
Lessons Learned
• Transformation SKOS to OWL causes some problems, especially regarding the labels
• Ontology matching systems are nevertheless able to match the thesauri and even discover unknown correct correspondences
• Interest of the community in this topic
Future Work
• Update reference alignment adapted results
• SKOS import for matching systems• Use instance data to match thesauri?• Other thesauri?
Thank you for your attention!