learning to map between ontologies on the semantic web anhai doan, jayant madhavan, pedro domingos,...

19
Learning to Map between Learning to Map between Ontologies on the Semantic Ontologies on the Semantic Web Web AnHai Doan, Jayant Madhavan , Pedro Domingos, and Alon Halevy Databases and Data Mining group University of Washington

Upload: keila-doidge

Post on 14-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Learning to Map between Learning to Map between Ontologies on the Semantic Ontologies on the Semantic WebWeb

AnHai Doan, Jayant Madhavan,Pedro Domingos, and Alon Halevy

Databases and Data Mining group

University of Washington

Semantic WebSemantic Web

Mark-up data on the web using ontologies

Enable intelligent information processing over the web Personal software agents Queries over multiple web pages …

An ExampleAn Example

Semantic Mappings allow information processing across Semantic Mappings allow information processing across ontologiesontologies

www.cs.washington.edu www.cs.usyd.edu.au

James CookPhD, U Sydney

Data Instance

Find Prof. Cook, a professor in a Seattle college, earlier an assoc. professor at his alma mater in Australia

Semantic Mapping

People

Staff Faculty

Professor Assoc. Professor

Asst. Professor

Academic Technical

Professor SeniorLecturer

Lecturer

Staff

NameEducation

… …

Semantic Web: State of the ArtSemantic Web: State of the Art

Languages for ontologies RDF, DAML+OIL,…

Ontology learning and Ontology design tools [Maedche’02], Protégé, Ontolingua,…

Semantic Mappings crucial to the SW vision [Uscold’01, Berners-Lee, et al.’01]

Without semantic mappings…Tower of Without semantic mappings…Tower of Babel !!!Babel !!!

Semantic Mapping Semantic Mapping ChallengesChallenges

Ontologies can be very different Different vocabularies, different design principles Overlap, but not coincide

Semantic Mapping information Data instances marked up with ontologies Concept names and taxonomic structure Constraints on the mapping

OverviewOverview

People

Staff Faculty

Professor

Assoc. Professo

r

Asst. Professo

r

Academic Technical

Professor

SeniorLecturer

Lecturer

Staff

Faculty

Define Similarity

Sim(Fac,Staff

)Sim(Fac,Prof)

Sim(Fac,Acad)

ComputeSimilarity

SatisfyConstraints

Academic?

Our ContributionsOur Contributions An automatic solution to taxonomy matching

Handles different similarity notions Exploits information in data instances and taxonomic

structure, using multi-strategy learning

Extend solution to handle wide variety of constraints, using Relaxation Labeling

An implementation, our GLUEGLUE system, and experiments on real-world taxonomies

High accuracy (68-98%) on large taxonomies (100-330 concepts)

Defining Similarity Defining Similarity

Multiple Similarity measures in terms of the JPDMultiple Similarity measures in terms of the JPD

Assoc. Prof Snr. Lecturer

A,S

A, S

A,S

A,S

P(A,S) + P(A,S) + P(A,S)

P(A,S)=

Joint Probability Distribution: P(A,S),P(A,S),P(A,S),P(A,S)

HypotheticalCommonMarked updomain

P(A S)

P(A S)Sim(Assoc. Prof., Snr. Lect.) =

[Jaccard, 1908]

No common data instancesNo common data instances

In practice, not easy to find data tagged with both ontologies !

United States Australia

Solution: Use Machine LearningSolution: Use Machine Learning

A

A

S

S

Machine Learning for computing Machine Learning for computing similaritiessimilarities

JPD estimated by counting the sizes of the partitionsJPD estimated by counting the sizes of the partitions

CLS

S

S

United States AustraliaA

A

S

S

CLA

A

A

A,S A,S

A,S A,S

A,S A,S

A,S A,S

Improve Predictive Accuracy – Use Improve Predictive Accuracy – Use Multi-Strategy Learning Multi-Strategy Learning

Single Classifier cannot exploit all available information

Combine the prediction of multiple classifiersCombine the prediction of multiple classifiers

CLA1

A

A

A

A

CLAN

A

A

…Content Learner

Frequencies on different words in the text in the data instances

Name LearnerWords used in the names of concepts in the taxonomy

Others …

Meta-Learner

So far…So far…

Define Similarity

ComputeSimilarity

SatisfyConstraints

Joint ProbabilityDistribution

Multi-strategyLearning

Constraints due to the taxonomy structure

Domain specific constraints Department-Chair can only map to a unique

concept

Numerous constraints of different types

StaffPeople

Next Step: Exploit ConstraintsNext Step: Exploit Constraints

Staff Fac

Prof Assoc. Prof Asst. Prof

Acad Tech

Prof Snr. Lect. Lect.

PeoplePeople StaffStaffParentsParents

ChildrenChildren

Extended Relaxation Labeling to ontology matchingExtended Relaxation Labeling to ontology matching

Solution: Relaxation LabelingSolution: Relaxation Labeling

Find the best label assignment given a set of constraints

Start with an initial label assignment Iteratively improves labels, given constraints

Standard Relaxation Labeling not applicable Extended in many ways

Acad

StaffPeople

Staff Fac

Prof Assoc. Prof Asst. Prof

Tech

Prof Snr. Lect. Lect.

PeoplePeople

ProfProf Assoc. Assoc. ProfProf

Asst. Asst. ProfProf

FacFac

ProProff

?

StaffStaff

Snr. Lect.Snr. Lect. Lect.Lect.

StaffStaff

AcadAcad

ProProff

Snr. Lect.Snr. Lect. Lect.Lect.

?

Distribution Estimator

Joint Distributions:P(A,B),P(A,B),…

Taxonomy O2

(structure + data instances)Taxonomy O1

(structure + data instances)

Putting it all together Putting it all together GLUE System GLUE System

Relaxation Labeler

Generic & Domain constraints

Mappings for O1 , Mappings for O2

Similarity Estimator

Similarity Matrix

Similarity function

DistributionEstimator

Meta Learner

Learner CL1 Learner CLN

Real World ExperimentsReal World Experiments Taxonomies on the web

University classes (UW and Cornell) Companies (Yahoo and The Standard)

For each taxonomy Extracted data instances – course descriptions, and company

profiles Trivial data cleaning 100 – 300 concepts per taxonomy 3-4 depth of taxonomies 10-90 average data instances per concept

Evaluation against manual mappings as the gold standard

ResultsResults

0

10

20

30

40

50

60

70

80

90

100

Cornell to Wash. Wash. to Cornell Cornell to Wash. Wash. to Cornell Standard to Yahoo Yahoo to Standard

Mat

chin

g a

ccu

racy

(%)

Name Learner Content Learner Meta Learner Relaxation Labeler

University I University II Companies

Related WorkRelated Work Our LSD schema matching system [Doan,

Domingos, Halevy ’01] GLUE handles taxonomies, richer models, and a

much richer set of constraints

Other Ontology and Schema Matching work [Noy, Musen’01], [Melnik, et al.’02], [Ichise, et al.’01] Mostly heuristics, or single machine learning

techniques

Relaxation Labeling for constraint satisfaction [Hummel, Zucker’83], [Chakrabarti, et al.’00] Significantly extend this approach

Conclusions & Future WorkConclusions & Future Work

An automated solution to taxonomy matching Handles multiple notions of similarity Exploits data instances and taxonomy structure Incorporates generic and domain-specific constraints Produces high accuracy results

Future Work More expressive models Complex Mappings Automated reasoning about mappings between

models