learning to match ontologies on the semantic web anhai doan jayant madhavan robin dhamankar pedro...

Post on 15-Jan-2016

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Learning to Match Ontologies on the Semantic Web

AnHai DoanJayant MadhavanRobin DhamankarPedro Domingos

Alon Halevy

Glue

Identifies Mappings between websites

Uses Machine Learning

Uses Common Sense Knowledge

Domain Constraints

Motivation

Data comes from Different Ontologies Answers come from multiple web

pages Manual:

very tedious, error prone, not very scalable

Outline

Overview of GLUE GLUE Architecture Case Studies CGLUE Case Studies Conclusion Assessment

Overview• Assumes 2 Ontologies• 1-1 Matching• Similarity between two Concepts

• Computing Joint Distribution• P(A,B), P(A, ~B), P(~A,B), P(~A,~B)

• Machine Learning• Multistrategy Learning• Exploiting Domain Constraints• Data Instances

Overview

Relaxation Labeler

Similarity Estimator

Meta Learner M

L1 Lk

Taxonomy 01 Taxonomy 02

Joint DistributionsSimilarity function

Similarity MatrixCommon knowledgeDomain constraints

Mappings for Taxonomies

…………

Distribution Estimator

Meta Learner M

Base LearnerL1 ………

Base LearnerLk

Taxonomy 01 Taxonomy 02

Joint Distributions

Distribution Estimator

R

DCA

FE

t1,t2 t3,t4

t5 t6,t7

t1,t2,t3,t4

t5,t6,t7

Trained Learner L

Distribution Estimator

G

HB

JIs2,s3 s4

s5,s6

s1,s2,s3,s4

s5,s6

L

s1

Distribution Estimator

s1,s3

s5 s6

s2,s4

Multistrategy Learning

Base Learners Content Learner

Frequency Naïve Bayes

Name Learner Full Name

Specific and Descriptive Element MetaLearner

MetaLearner

Combines the base learners Gives learner weight

User Input

Joint DistributionsSimilarity function

Similarity Estimator

Similarity Matrix

Similarity Estimator

Similarity Estimator Applies Function From User

Jaccard-sim

Outputs a matrix between concepts

Where are we?

Find Similarities

Compute Similarities

Satisfy Constraints

Relaxation Labeler

Relaxation Labeler

Similarity MatrixCommon knowledgeDomain constraints

Mappings for Taxonomies

Constraints

Domain-Independent General Knowledge

Domain-Dependent Interaction between two nodes

Model each as a feature f()

Domain Independent

Relaxation Labeler

Searches for best mapping given constraints

Labels are influenced by it “neighborhood”

Performs local optimization

Local Optimization

1. Assigns initial labels 2. Performs Optimization 3. Uses a formula to change a label 4. Repeat 2-3

Local Optimization

Node in taxonomy O1 Label in taxonomy O2 Everything we know

Other label assignments to all Nodes besides X

Local Optimization

Where are we?

Relaxation Labeler

Similarity Estimator

Meta Learner M

L1 Lk

Taxonomy 01 Taxonomy 02

Joint DistributionsSimilarity function

Similarity MatrixCommon knowledgeDomain constraints

Mappings for Taxonomies

…………

Case Study

• University Catalogs• Business Profiles

• For Each one• Entire set of data instances• Cleaned it up

Results

Improvements

Insufficient Training Data Local Optimization Additional Base Learners Ambiguous Best Match

CGLUE

CGLUE

Beam Search Uses structure and data No relaxation labeling (no

constraints)

CGLUE Case Study

Improvements

Incorporate Domain Constraints Object Identification

Conclusion

Semantic Similarity Multistategy Learning Relaxation Labeling CGLUE

Assessment

Data Instances Additional Sites? CGLUE Future Work

top related