semantic matching pavel shvaiko stanford university, october 31, 2003 paper with fausto giunchiglia...

31
Semantic Matching Pavel Shvaiko Stanford University, October 31, 2003 Paper with Fausto Giunchiglia Research group (alphabetically ordered): Fausto Giunchiglia, Pavel Shvaiko, Mikalai Yatskevich, Ilya Zaihrayeu

Upload: june-stephanie-roberts

Post on 25-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Semantic MatchingSemantic MatchingPavel Shvaiko

Stanford University, October 31, 2003

Paper with Fausto Giunchiglia

Research group (alphabetically ordered): Fausto Giunchiglia, Pavel Shvaiko, Mikalai Yatskevich, Ilya Zaihrayeu

2

Stanford University, October 31, 2003

Outline

Matching

Syntactic Matching

Semantic Matching

On Implementing Semantic Matching

Conclusions

3

Stanford University, October 31, 2003

MATCHING

4

Stanford University, October 31, 2003

Application Domains

Generic Model Management Schema integration

Data warehouses

E-commerce

Data Coordination in P2P systems, Semantic Web

5

Stanford University, October 31, 2003

Example of Matching

Arts

Organizations

Art History

Music

Baroque

History

www.google.com

Organizations

Arts&Humanities

Art History

www.yahoo.com

Design Art

Baroque

Architecture

History

Sc=0.9

Sr={}

Sc=1.0

Sr={}

Sr={}

6

Stanford University, October 31, 2003

Matching

Match is an operator that takes two graph-like structures (e.g., database schemas or ontologies) and produces a mapping between elements of the two graphs that correspond semantically to each other

7

Stanford University, October 31, 2003

Matching

The problem of matching can be decomposed in two steps:

Extract graphs from the data and conceptual models

Match the resulting graphs (generic matching)

8

Stanford University, October 31, 2003

Matching

Mapping element is a 4-tuple < mID, Ni1, N

j2, R >

mID is a unique identifier of the given mapping element;

Ni1 is the i-th node of the first graph,

Nj2 is the j-th node of the second graph,

R specifies a similarity relation of the given nodes

Mapping is a set of mapping elements

Matching is the process of discovering mappings between two graphs through the application of a matching algorithm

9

Stanford University, October 31, 2003

Matching

Semantic Matching

Syntactic Matching•R is computed between labels at nodes

•R = {x[0,1]}

•R is computed between concepts at nodes

•R = { =, , , , }

Matching: Syntactic AND Semantic

10

Stanford University, October 31, 2003

SYNTACTIC MATCHING

11

Stanford University, October 31, 2003

Syntactic Matching

Mapping element is a 4-tuple < mID, Li1, L

j2, R >, where

Li1 is the label at the i-th node of the first graph;

Lj2 is the label at the j-th node of the second graph;

R specifies a similarity relation in the form of a coefficient, which measures the similarity between the labels of the given nodes

Example: R is a similarity coefficient in [0,1]

R = <m21,telephone, phone, 0.7>

12

Stanford University, October 31, 2003

The State of the Art

Cupid… is a hybrid matching prototype. It exploits linguistic and structural schema matching heuristics, and computes similarity coefficients between nodes of the trees.

Similarity Flooding… is a hybrid matching prototype. It uses fix-point computation to determine correspondences between nodes of the graphs.

COMA…is a composite matching prototype. It provides an extensible library of different matchers which manipulate DAGs and supports various ways of combining final results.

As far as we know, so far only syntactic matching…

13

Stanford University, October 31, 2003

SEMANTIC MATCHING

14

Stanford University, October 31, 2003

Semantic Matching  

Mapping element is a 4-tuple < mID, Ci1, C

j2, R >, where

Ci1 is the concept of the i-th node of the first graph;

Cj2 is the concept of the j-th node of the second graph;

R specifies a similarity relation in the form of a semantic relation between the extensions of concepts at the given nodes

Possible R’s: equality {=}, overlapping {}, mismatch {}, more general/specific {, }

Example: R = <m21,telephone, phone, {=}>

15

Stanford University, October 31, 2003

Examples: Analysis of Ancestors. Case 1

Suppose that we want to match nodes 51 and 12

A

C

is-a

B

is-a

is-a is-a

E D

1

2

4

3

5

A

C B

is-a is-a

is-a is-a

E D

1

4 3

2 5

Cupid does not find a similarity coefficient between the nodes under consideration, due to the significant differences in structure of the given graphs

Semantic matching: The concept denoted by the label at node 51 is CC1, while the concept at node 51 is C51 =

CA1CC1. The concept at node 12 is C12 = CC2. Thus, C51 C12

16

Stanford University, October 31, 2003

Examples: Analysis of Ancestors. Case 2

Suppose that we want to match nodes 51 and 52

A

C B

is-a is-a

is-a is-a

E D

1

4 3

2 5 *

A

is-a

C

is-a

is-a is-a

E D

1

2

3

5

Cupid: R= 0,86. This is because of the identity of labels A1=A2, C1=C2

Semantic matching: The concept at node 51 is C51 =

CA1CC1; while the concept at node 52 is C52 = CA2*CC2.

Since we have that CA1=CA2 and CC1=CC2, then C52 C51

17

Stanford University, October 31, 2003

ON

IMPLEMENTING

SEMANTIC MATCHING

18

Stanford University, October 31, 2003

On Implementation

Semantic Matching

Structure - level

Element - level

Weak Semantics Techniques

Strong Semantics Techniques

19

Stanford University, October 31, 2003

Element-level Semantic Matching

Weak Semantics Techniques Analysis of strings {=}

<phone, telephone,{=}>

Analysis of data types {=, , , , } <string, integer,{}>

<integer, real,{}>

Analysis of soundex {=}< Fausto, Phausto,{=}>

Strong Semantics Techniques Precompiled thesaurus

syn key <Discount, Rebate,{=}>

WordNet <Art_#1, Humanities_#1,{}>, where #1 … sense number 1 of the word Art according to WordNet

20

Stanford University, October 31, 2003

Element-level Semantic Matching (cont.)

Semantic Relations via WordNetEquality: one concept is equal to another if there is at least one sense of the first concept, which is a synonym of the secondOverlapping: one concept is overlapped with the other if there are some senses in commonMismatch: two concepts are mismatched if they have no sense in commonMore general: one concept is more general then the other iff there exists at least one sense of the first concept that has a sense of the other as a hyponym or meronymLess general: one concept is less general than the other iff there exists at least one sense of the first concept that has a sense of the other concept as hypernym or as a holonym

21

Stanford University, October 31, 2003

Structure-level Semantic Matching

We translate the matching problem, namely the two graphs (in particular, the pair of nodes submitted to matching) into a propositional formula and then check for its validity

We check for validity using SAT

22

Stanford University, October 31, 2003

Semantic Matching Algorithm

1. Extract the two graphs

2. Compute element-level semantic matching

3. Compute concepts at nodes

4. Construct the propositional formula

5. Run SAT

6. Perform iterations

23

Stanford University, October 31, 2003

Semantic Matching Algorithm: Example – (1)

Extract the two graphs

A

C

is-a

B

is-a

is-a is-a

E D

1

2

4

3

5

A

C B

is-a is-a

is-a is-a

E D

1

4 3

2 5

• In the case of RDB, XML and OODB schemas, it is

necessary to extract useful semantic information, for instance in the form of ontologies

24

Stanford University, October 31, 2003

Semantic Matching Algorithm: Example – (2)Element-level semantic matching. For each node, compute semantic relations holding among all the concepts denoted by labels at nodes under consideration

A

C

is-a

B

is-a

is-a is-a

E D

1

2

4

3

5

A

C B

is-a is-a

is-a is-a

E D

1

4 3

2 5

CA1 = CA2

CB1 = CB2

CC1 = CC2

CD1 = CD2

CE1 = CE2

25

Stanford University, October 31, 2003

Semantic Matching Algorithm: Example – (3)

Compute concepts at nodes. Suppose, we want to find a semantic relation between nodes 51 and 12

A

C

is-a

B

is-a

is-a is-a

E D

1

2

4

3

5

A

C B

is-a is-a

is-a is-a

E D

1

4 3

2 5

?

C11 = CA1

C51 = CA1 CC2

C12 = CC2

26

Stanford University, October 31, 2003

Semantic Matching Algorithm: Example – (4)Construct the propositional formula. We translate all the semantic relations computed in step 2 into propositional formulas under the following rules:

CA1 CA2 CA2 CA1CA1 CA2 CA1 CA2CA1 = CA2 CA1 CA2CA1 CA2 (CA1 CA2)

A

C

is-a

B

is-a

is-a is-a

E D

1

2

4

3

5

A

C B

is-a is-a

is-a is-a

E D

1

4 3

2 5

?

From step 2 we have: CC1 CC2.

We want to prove that C51 C12 ( we guess relation

between nodes at this stage)

(CA1 CC1) CC2

(CC1 CC2) ((CA1 CC1) CC2)

27

Stanford University, October 31, 2003

Semantic Matching Algorithm: Example – (5)

Run SAT

In order to prove that (CC1 CC2) ((CA1 CC1 ) CC2) is

valid, we prove that its negation is unsatisfiabile

(CC1 CC2) ((CA1 CC1) CC2)

SAT returns FALSE

Thus, C51 C12

28

Stanford University, October 31, 2003

Example: Cupid vs. Semantic Matching

Arts

Organizations

Art History

Music

Baroque

History

www.google.com

Organizations

Arts&Humanities

Art History

www.yahoo.com

Design Art

Baroque

Architecture

History{}

{}

{}

{}

{}

{}

29

Stanford University, October 31, 2003

Conclusions

We have made a rational reconstruction of the major matching problems and articulated them in terms of the more generic problem of matching graphs

We have identified semantic matching as a new approach for performing generic matching

We have proposed an implementation of semantic matching using SAT

30

Stanford University, October 31, 2003

Future Work

Extend to a full graph matcher

How to extract semantics from schemas

Study how to take into account attributes and instances

Develop an efficient implementation of the system

Do a thorough testing of the system

31

Stanford University, October 31, 2003

References

Project website: http://www.dit.unitn.it/~p2p/

F. Giunchiglia, P.Shvaiko “Semantic Matching”. Technical Report #DIT-03-013. Also to appear in The Knowledge Engineering Review journal. Short version in proceedings of Semantic Integration workshop at ISWC’03.

F. Giunchiglia, I. Zaihrayeu “Making peer databases interact – a vision for an architecture supporting data coordination” In Proc. Of the Conference of Information Agents (CIA 2002), Madrid, 2002.