multi-concept alignment and evaluation shenghui wang, antoine isaac, lourens van der meij, stefan...
Post on 21-Dec-2015
215 views
TRANSCRIPT
Multi-Concept Alignment and Evaluation
Shenghui Wang, Antoine Isaac,
Lourens van der Meij, Stefan Schlobach
Ontology Matching Workshop
Oct. 11th, 2007
Multi-Concept Alignment and Evaluation
Introduction: Multi-Concept Alignment
• Mappings involving combinations of concepts• o1:FruitsAndVegetables → (o2:Fruits OR
o2:Vegetables)
• Also referred to as:• Multiple, complex
• Problem: only a few matching tools consider it• Cf. [Euzenat & Shvaiko]
Multi-Concept Alignment and Evaluation
Why is MCA a Difficult Problem?
• Much larger search space: |O1| x |O2| → 2 |O1|x 2 |O2|
• How to measure similarity between sets of concepts?• Based on which information and strategies?
“Fruits and vegetables” vs. “Fruits” and “Vegetables” together
• Formal frameworks for MCA?• Representation primitives
• owl:IntersectionOf? skosm:AND?
• SemanticsA skos:broader ( skosm:AND B C) A broader B & A broaderC ?
Multi-Concept Alignment and Evaluation
Agenda
• The multi-concept alignment problem
• The Library case and the need for MCA
• Generating MCAs for the Library case
• Evaluating MCAs in the Library case
• Conclusion
Multi-Concept Alignment and Evaluation
Yet MCA is needed in real-life problems
• KB collections (cf. OAEI slides)
• Scenario: re-annotation of GTT-indexed books by Brinkman concepts
ScientificCollection
Depot
1.4Mbooks
1Mbooks
GTT Brinkman
Multi-Concept Alignment and Evaluation
Yet MCA is needed in real-life problems
• Books can be indexed by several concepts
• with post-coordination: co-occurrence matters
{G1=“History” , G2=“the Netherlands”} in GTT
→ a book about Dutch history
• Granularity of two vocabularies differ
→{B1=“Netherlands; History”}
• Alignment should associate combination of concepts
? ? ?
Multi-Concept Alignment and Evaluation
Agenda
• The multi-concept alignment problem
• The Library case and the need for MCA
• Generating MCAs for the Library case
• Evaluating MCAs in the Library case
• Conclusion
Multi-Concept Alignment and Evaluation
MCA for Annotation Translation: Approach
• Produce similarity measures between individual concepts• Sim(A,B) =X
• Grouping concepts based on their similarity• {G1,B1,G2,G3,B2}
• Creating conversion rules• {G1,G2,G3} → {B1,B2}
• Extraction of deployable alignment
Multi-Concept Alignment and Evaluation
MCA Creation: Similarity Measures
• KB scenario has dually indexed books
• Brinkman and GTT concepts co-occur
• Instance-based alignment techniques can be used
• Between concepts from a same vocabulary, similarity mirrors possible combinations!
ScientificCollection
Depot
1.4Mbooks
1Mbooks
GTT Brinkman
250Kbooks
Multi-Concept Alignment and Evaluation
MCA Creation: 2 Similarity Measures
• Jaccard overlap measure applied on concept extensions
• Latent Semantic Analysis• Computation of similarity matrix
• Filter noise due to insufficient data
• Similarity between concepts between vocabularies and inside vocabularies
Multi-Concept Alignment and Evaluation
MCA Creation: 2 Concept Aggregation Methods
• Simple Ranking• For a concept, take the top k similar concepts
• Gather GTT concepts and Brinkman ones
• Clustering• Partitioning concepts into similarity-based clusters
• Gather concepts
Global approach: the most relevant combinations should be selected
Multi-Concept Alignment and Evaluation
Generated Rules
• Clustering generated much less rules
• With more concepts
Multi-Concept Alignment and Evaluation
Agenda
• The multi-concept alignment problem
• The Library case and the need for MCA
• Generating MCAs for the Library case
• Evaluating MCAs in the Library case
• Conclusion
Multi-Concept Alignment and Evaluation
Evaluation Method: data sets
• Training and evaluation set from dually-indexed books
• 2/3 training, 1/3 testing
• Two training sets (samples)• Random
• Rich: books that have at least 8 annotations (both thesauri)
ScientificCollection
Depot
1.4Mbooks
1Mbooks
GTT Brinkman
250Kbooks
Multi-Concept Alignment and Evaluation
Evaluation Method: Applying Rules
• Several configurations for firing rules
• 1. Gt = Gr
• 2. Gt Gr
• 3. Gt Gr
• 4. ALL
? ? ?
Gt Gr1→Br1
Gr2→Br2Gr3→Br3
Multi-Concept Alignment and Evaluation
Evaluation Measures
• Precision and recall for matched books• Books that were given at least one good Brinkman
annotation
• Pb, Rb
• Precision and recall for annotation translation• Averaged over books
Multi-Concept Alignment and Evaluation
Results: for ALL Strategy
Multi-Concept Alignment and Evaluation
Results: Rich vs. Random Training Set
• Rich does not improve the results a lot
• Bias towards richly annotated books
• Jaccard performances go down
• LSA does better
• Statistical corrections allow simple grouping techniques to cope with data complexity
Multi-Concept Alignment and Evaluation
Results : for Clustering
Multi-Concept Alignment and Evaluation
Results: Jaccard vs. LSA
• For 3 and ALL, LSA outperforms Jaccard
• For 1 and 2 Jaccard outperforms LSA
• Simple similarity is better at finding explicit similarities
• Really occurring in books
• LSA is better at finding potential similarities
Multi-Concept Alignment and Evaluation
Results : using LSA
Multi-Concept Alignment and Evaluation
Results: Clustering vs. Ranking
• Clusters performs better on strategies 1 and 2• They match existing annotations better
• They have better precision
• Ranking has higher recall but lower precision
Classical tradeoff (ranking keeps noise)
Multi-Concept Alignment and Evaluation
Agenda
• The multi-concept alignment problem
• The Library case and the need for MCA
• Generating MCAs for the Library case
• Evaluating MCAs in the Library case
• Conclusion
Multi-Concept Alignment and Evaluation
Conclusions
• There is an important problem: multi-concept alignment• Not extensively dealt with current litterature
• Needed by applications
• We have first approaches to create such alignments
• And to deploy them!
• We hope that further research will improve the situation (with our ‘deployer’ hat on)• Better alignments
• More precise frameworks (methodology research)
Multi-Concept Alignment and Evaluation
Conclusions: performances
• Evaluation shows mitigated results• Performances are generally very low
• These techniques cannot be used alone
• Notice: dependence on requirements• Settings were manual indexer choose among several
candidates allow for lower precision
• Notice: indexing variablity • OAEI have demonstrated that manual evaluation
somehow compensates for the bias of automatic one
Multi-Concept Alignment and Evaluation
Thanks!