multi-concept alignment and evaluation shenghui wang, antoine isaac, lourens van der meij, stefan...

26
Multi-Concept Alignment and Evaluation Shenghui Wang, Antoine Isaac, Lourens van der Meij, Stefan Schlobach Ontology Matching Workshop Oct. 11 th , 2007

Post on 21-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multi-Concept Alignment and Evaluation Shenghui Wang, Antoine Isaac, Lourens van der Meij, Stefan Schlobach Ontology Matching Workshop Oct. 11 th, 2007

Multi-Concept Alignment and Evaluation

Shenghui Wang, Antoine Isaac,

Lourens van der Meij, Stefan Schlobach

Ontology Matching Workshop

Oct. 11th, 2007

Page 2: Multi-Concept Alignment and Evaluation Shenghui Wang, Antoine Isaac, Lourens van der Meij, Stefan Schlobach Ontology Matching Workshop Oct. 11 th, 2007

Multi-Concept Alignment and Evaluation

Introduction: Multi-Concept Alignment

• Mappings involving combinations of concepts• o1:FruitsAndVegetables → (o2:Fruits OR

o2:Vegetables)

• Also referred to as:• Multiple, complex

• Problem: only a few matching tools consider it• Cf. [Euzenat & Shvaiko]

Page 3: Multi-Concept Alignment and Evaluation Shenghui Wang, Antoine Isaac, Lourens van der Meij, Stefan Schlobach Ontology Matching Workshop Oct. 11 th, 2007

Multi-Concept Alignment and Evaluation

Why is MCA a Difficult Problem?

• Much larger search space: |O1| x |O2| → 2 |O1|x 2 |O2|

• How to measure similarity between sets of concepts?• Based on which information and strategies?

“Fruits and vegetables” vs. “Fruits” and “Vegetables” together

• Formal frameworks for MCA?• Representation primitives

• owl:IntersectionOf? skosm:AND?

• SemanticsA skos:broader ( skosm:AND B C) A broader B & A broaderC ?

Page 4: Multi-Concept Alignment and Evaluation Shenghui Wang, Antoine Isaac, Lourens van der Meij, Stefan Schlobach Ontology Matching Workshop Oct. 11 th, 2007

Multi-Concept Alignment and Evaluation

Agenda

• The multi-concept alignment problem

• The Library case and the need for MCA

• Generating MCAs for the Library case

• Evaluating MCAs in the Library case

• Conclusion

Page 5: Multi-Concept Alignment and Evaluation Shenghui Wang, Antoine Isaac, Lourens van der Meij, Stefan Schlobach Ontology Matching Workshop Oct. 11 th, 2007

Multi-Concept Alignment and Evaluation

Yet MCA is needed in real-life problems

• KB collections (cf. OAEI slides)

• Scenario: re-annotation of GTT-indexed books by Brinkman concepts

ScientificCollection

Depot

1.4Mbooks

1Mbooks

GTT Brinkman

Page 6: Multi-Concept Alignment and Evaluation Shenghui Wang, Antoine Isaac, Lourens van der Meij, Stefan Schlobach Ontology Matching Workshop Oct. 11 th, 2007

Multi-Concept Alignment and Evaluation

Yet MCA is needed in real-life problems

• Books can be indexed by several concepts

• with post-coordination: co-occurrence matters

{G1=“History” , G2=“the Netherlands”} in GTT

→ a book about Dutch history

• Granularity of two vocabularies differ

→{B1=“Netherlands; History”}

• Alignment should associate combination of concepts

? ? ?

Page 7: Multi-Concept Alignment and Evaluation Shenghui Wang, Antoine Isaac, Lourens van der Meij, Stefan Schlobach Ontology Matching Workshop Oct. 11 th, 2007

Multi-Concept Alignment and Evaluation

Agenda

• The multi-concept alignment problem

• The Library case and the need for MCA

• Generating MCAs for the Library case

• Evaluating MCAs in the Library case

• Conclusion

Page 8: Multi-Concept Alignment and Evaluation Shenghui Wang, Antoine Isaac, Lourens van der Meij, Stefan Schlobach Ontology Matching Workshop Oct. 11 th, 2007

Multi-Concept Alignment and Evaluation

MCA for Annotation Translation: Approach

• Produce similarity measures between individual concepts• Sim(A,B) =X

• Grouping concepts based on their similarity• {G1,B1,G2,G3,B2}

• Creating conversion rules• {G1,G2,G3} → {B1,B2}

• Extraction of deployable alignment

Page 9: Multi-Concept Alignment and Evaluation Shenghui Wang, Antoine Isaac, Lourens van der Meij, Stefan Schlobach Ontology Matching Workshop Oct. 11 th, 2007

Multi-Concept Alignment and Evaluation

MCA Creation: Similarity Measures

• KB scenario has dually indexed books

• Brinkman and GTT concepts co-occur

• Instance-based alignment techniques can be used

• Between concepts from a same vocabulary, similarity mirrors possible combinations!

ScientificCollection

Depot

1.4Mbooks

1Mbooks

GTT Brinkman

250Kbooks

Page 10: Multi-Concept Alignment and Evaluation Shenghui Wang, Antoine Isaac, Lourens van der Meij, Stefan Schlobach Ontology Matching Workshop Oct. 11 th, 2007

Multi-Concept Alignment and Evaluation

MCA Creation: 2 Similarity Measures

• Jaccard overlap measure applied on concept extensions

• Latent Semantic Analysis• Computation of similarity matrix

• Filter noise due to insufficient data

• Similarity between concepts between vocabularies and inside vocabularies

Page 11: Multi-Concept Alignment and Evaluation Shenghui Wang, Antoine Isaac, Lourens van der Meij, Stefan Schlobach Ontology Matching Workshop Oct. 11 th, 2007

Multi-Concept Alignment and Evaluation

MCA Creation: 2 Concept Aggregation Methods

• Simple Ranking• For a concept, take the top k similar concepts

• Gather GTT concepts and Brinkman ones

• Clustering• Partitioning concepts into similarity-based clusters

• Gather concepts

Global approach: the most relevant combinations should be selected

Page 12: Multi-Concept Alignment and Evaluation Shenghui Wang, Antoine Isaac, Lourens van der Meij, Stefan Schlobach Ontology Matching Workshop Oct. 11 th, 2007

Multi-Concept Alignment and Evaluation

Generated Rules

• Clustering generated much less rules

• With more concepts

Page 13: Multi-Concept Alignment and Evaluation Shenghui Wang, Antoine Isaac, Lourens van der Meij, Stefan Schlobach Ontology Matching Workshop Oct. 11 th, 2007

Multi-Concept Alignment and Evaluation

Agenda

• The multi-concept alignment problem

• The Library case and the need for MCA

• Generating MCAs for the Library case

• Evaluating MCAs in the Library case

• Conclusion

Page 14: Multi-Concept Alignment and Evaluation Shenghui Wang, Antoine Isaac, Lourens van der Meij, Stefan Schlobach Ontology Matching Workshop Oct. 11 th, 2007

Multi-Concept Alignment and Evaluation

Evaluation Method: data sets

• Training and evaluation set from dually-indexed books

• 2/3 training, 1/3 testing

• Two training sets (samples)• Random

• Rich: books that have at least 8 annotations (both thesauri)

ScientificCollection

Depot

1.4Mbooks

1Mbooks

GTT Brinkman

250Kbooks

Page 15: Multi-Concept Alignment and Evaluation Shenghui Wang, Antoine Isaac, Lourens van der Meij, Stefan Schlobach Ontology Matching Workshop Oct. 11 th, 2007

Multi-Concept Alignment and Evaluation

Evaluation Method: Applying Rules

• Several configurations for firing rules

• 1. Gt = Gr

• 2. Gt Gr

• 3. Gt Gr

• 4. ALL

? ? ?

Gt Gr1→Br1

Gr2→Br2Gr3→Br3

Page 16: Multi-Concept Alignment and Evaluation Shenghui Wang, Antoine Isaac, Lourens van der Meij, Stefan Schlobach Ontology Matching Workshop Oct. 11 th, 2007

Multi-Concept Alignment and Evaluation

Evaluation Measures

• Precision and recall for matched books• Books that were given at least one good Brinkman

annotation

• Pb, Rb

• Precision and recall for annotation translation• Averaged over books

Page 17: Multi-Concept Alignment and Evaluation Shenghui Wang, Antoine Isaac, Lourens van der Meij, Stefan Schlobach Ontology Matching Workshop Oct. 11 th, 2007

Multi-Concept Alignment and Evaluation

Results: for ALL Strategy

Page 18: Multi-Concept Alignment and Evaluation Shenghui Wang, Antoine Isaac, Lourens van der Meij, Stefan Schlobach Ontology Matching Workshop Oct. 11 th, 2007

Multi-Concept Alignment and Evaluation

Results: Rich vs. Random Training Set

• Rich does not improve the results a lot

• Bias towards richly annotated books

• Jaccard performances go down

• LSA does better

• Statistical corrections allow simple grouping techniques to cope with data complexity

Page 19: Multi-Concept Alignment and Evaluation Shenghui Wang, Antoine Isaac, Lourens van der Meij, Stefan Schlobach Ontology Matching Workshop Oct. 11 th, 2007

Multi-Concept Alignment and Evaluation

Results : for Clustering

Page 20: Multi-Concept Alignment and Evaluation Shenghui Wang, Antoine Isaac, Lourens van der Meij, Stefan Schlobach Ontology Matching Workshop Oct. 11 th, 2007

Multi-Concept Alignment and Evaluation

Results: Jaccard vs. LSA

• For 3 and ALL, LSA outperforms Jaccard

• For 1 and 2 Jaccard outperforms LSA

• Simple similarity is better at finding explicit similarities

• Really occurring in books

• LSA is better at finding potential similarities

Page 21: Multi-Concept Alignment and Evaluation Shenghui Wang, Antoine Isaac, Lourens van der Meij, Stefan Schlobach Ontology Matching Workshop Oct. 11 th, 2007

Multi-Concept Alignment and Evaluation

Results : using LSA

Page 22: Multi-Concept Alignment and Evaluation Shenghui Wang, Antoine Isaac, Lourens van der Meij, Stefan Schlobach Ontology Matching Workshop Oct. 11 th, 2007

Multi-Concept Alignment and Evaluation

Results: Clustering vs. Ranking

• Clusters performs better on strategies 1 and 2• They match existing annotations better

• They have better precision

• Ranking has higher recall but lower precision

Classical tradeoff (ranking keeps noise)

Page 23: Multi-Concept Alignment and Evaluation Shenghui Wang, Antoine Isaac, Lourens van der Meij, Stefan Schlobach Ontology Matching Workshop Oct. 11 th, 2007

Multi-Concept Alignment and Evaluation

Agenda

• The multi-concept alignment problem

• The Library case and the need for MCA

• Generating MCAs for the Library case

• Evaluating MCAs in the Library case

• Conclusion

Page 24: Multi-Concept Alignment and Evaluation Shenghui Wang, Antoine Isaac, Lourens van der Meij, Stefan Schlobach Ontology Matching Workshop Oct. 11 th, 2007

Multi-Concept Alignment and Evaluation

Conclusions

• There is an important problem: multi-concept alignment• Not extensively dealt with current litterature

• Needed by applications

• We have first approaches to create such alignments

• And to deploy them!

• We hope that further research will improve the situation (with our ‘deployer’ hat on)• Better alignments

• More precise frameworks (methodology research)

Page 25: Multi-Concept Alignment and Evaluation Shenghui Wang, Antoine Isaac, Lourens van der Meij, Stefan Schlobach Ontology Matching Workshop Oct. 11 th, 2007

Multi-Concept Alignment and Evaluation

Conclusions: performances

• Evaluation shows mitigated results• Performances are generally very low

• These techniques cannot be used alone

• Notice: dependence on requirements• Settings were manual indexer choose among several

candidates allow for lower precision

• Notice: indexing variablity • OAEI have demonstrated that manual evaluation

somehow compensates for the bias of automatic one

Page 26: Multi-Concept Alignment and Evaluation Shenghui Wang, Antoine Isaac, Lourens van der Meij, Stefan Schlobach Ontology Matching Workshop Oct. 11 th, 2007

Multi-Concept Alignment and Evaluation

Thanks!