presentation(ppt)

44
Containment of Partially Specified Tree-Pattern Queries Dimitri Theodoratos (NJIT, USA) Theodore Dalamagas (NTUA, GREECE) Pawel Placek (NJIT, USA) Stefanos Souldatos (NTUA, GREECE) Timos Sellis (NTUA, GREECE)

Upload: hondafanatics

Post on 18-Jan-2015

301 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: presentation(ppt)

Containment of Partially Specified

Tree-Pattern QueriesDimitri Theodoratos (NJIT, USA)

Theodore Dalamagas (NTUA, GREECE)

Pawel Placek (NJIT, USA)

Stefanos Souldatos (NTUA, GREECE)

Timos Sellis (NTUA, GREECE)

Page 2: presentation(ppt)

IntroductionData Model

Additional ConceptsQuery Containment

ExperimentsConclusion

Page 3: presentation(ppt)

Stefanos Souldatos - HDMS 2006 3

Motivating Example () Tree structure (e.g. XML) with motorbike spare parts. We search for spare parts. BUT…

rr

ATHENSATHENS

HONDAHONDA

GREECEGREECE USAUSA

YAMAHAYAMAHA BMWBMW

TRAVELTRAVEL

VARADEROVARADERO

125cc125cc 1000cc1000cc

ON-OFFON-OFF

200cc200cc

SERROWSERROW

TRAVELTRAVEL

650cc650cc

F650F650F650GSF650GS

YAMAHAYAMAHA BMWBMW

ON-OFFON-OFF

200cc200cc

SERROWSERROW

TRAVELTRAVEL

F650GSF650GS

650cc650cc

NJNJ

Page 4: presentation(ppt)

Stefanos Souldatos - HDMS 2006 4

Motivating Example () Dimitri Theodoratos lives in NJ. He has a Yamaha Serrow motorbike in Greece. He searches for spare parts in Greece or USA.

structural differencerr

ATHENSATHENS

HONDAHONDA

GREECEGREECE USAUSA

YAMAHAYAMAHA BMWBMW

TRAVELTRAVEL

VARADEROVARADERO

125cc125cc 1000cc1000cc

ON-OFFON-OFF

200cc200cc

SERROWSERROW

TRAVELTRAVEL

650cc650cc

F650F650F650GSF650GS

YAMAHAYAMAHA BMWBMW

ON-OFFON-OFF

200cc200cc

SERROWSERROW

TRAVELTRAVEL

F650GSF650GS

650cc650cc

NJNJ

?

Page 5: presentation(ppt)

Stefanos Souldatos - HDMS 2006 5

Motivating Example () Theodore Dalamagas has a BMW motorbike. He looks for spare parts worldwide.

structural inconsistency

../F650GS/650ccrr

ATHENSATHENS

HONDAHONDA

GREECEGREECE USAUSA

YAMAHAYAMAHA BMWBMW

TRAVELTRAVEL

VARADEROVARADERO

125cc125cc 1000cc1000cc

ON-OFFON-OFF

200cc200cc

SERROWSERROW

TRAVELTRAVEL

650cc650cc

F650F650F650GSF650GS

YAMAHAYAMAHA BMWBMW

ON-OFFON-OFF

200cc200cc

SERROWSERROW

TRAVELTRAVEL

F650GSF650GS

650cc650cc

NJNJ

../650cc/F650GS

Page 6: presentation(ppt)

Stefanos Souldatos - HDMS 2006 6

Motivating Example () Stefanos Souldatos has a Honda Varadero. But, he is not fully aware of the tree structure.

unknown structure

rr

ATHENSATHENS

HONDAHONDA

GREECEGREECE USAUSA

YAMAHAYAMAHA BMWBMW

TRAVELTRAVEL

VARADEROVARADERO

125cc125cc 1000cc1000cc

ON-OFFON-OFF

200cc200cc

SERROWSERROW

TRAVELTRAVEL

650cc650cc

F650F650F650GSF650GS

YAMAHAYAMAHA BMWBMW

ON-OFFON-OFF

200cc200cc

SERROWSERROW

TRAVELTRAVEL

F650GSF650GS

650cc650cc

NJNJ

Page 7: presentation(ppt)

Stefanos Souldatos - HDMS 2006 7

rr

ATHENSATHENS

HONDAHONDA

GREECEGREECE USAUSA

YAMAHAYAMAHA BMWBMW

TRAVELTRAVEL

VARADEROVARADERO

125cc125cc 1000cc1000cc

ON-OFFON-OFF

200cc200cc

SERROWSERROW

TRAVELTRAVEL

650cc650cc

F650F650F650GSF650GS

YAMAHAYAMAHA BMWBMW

ON-OFFON-OFF

200cc200cc

SERROWSERROW

TRAVELTRAVEL

F650GSF650GS

650cc650cc

NJNJ

Motivating Example () Pawel Placek wants to buy a motorbike that he can

easily find spare parts for. He searches in many different tree structures. source integration

rr

ATHENSATHENS

HONDAHONDA

GREECEGREECE USAUSA

YAMAHAYAMAHA BMWBMW

TRAVELTRAVEL

VARADEROVARADERO

125cc125cc 1000cc1000cc

ON-OFFON-OFF

200cc200cc

SERROWSERROW

TRAVELTRAVEL

650cc650cc

F650F650F650GSF650GS

YAMAHAYAMAHA BMWBMW

ON-OFFON-OFF

200cc200cc

SERROWSERROW

TRAVELTRAVEL

F650GSF650GS

650cc650cc

NJNJ

rr

ATHENSATHENS

HONDAHONDA

GREECEGREECE USAUSA

YAMAHAYAMAHA BMWBMW

TRAVELTRAVEL

VARADEROVARADERO

125cc125cc 1000cc1000cc

ON-OFFON-OFF

200cc200cc

SERROWSERROW

TRAVELTRAVEL

650cc650cc

F650F650F650GSF650GS

YAMAHAYAMAHA BMWBMW

ON-OFFON-OFF

200cc200cc

SERROWSERROW

TRAVELTRAVEL

F650GSF650GS

650cc650cc

NJNJ

Page 8: presentation(ppt)

Stefanos Souldatos - HDMS 2006 8

Motivation

Querying tree-structured data

BUT

structure is not always strictly defined

user does not always deal with structure: Find Honda spare parts in Greece.

Page 9: presentation(ppt)

IntroductionData Model

Additional ConceptsQuery Containment

ExperimentsConclusion

Page 10: presentation(ppt)

Stefanos Souldatos - HDMS 2006 11

Dimension Graph

rr

ATHENSATHENS

HONDAHONDA

GREECEGREECE USAUSA

YAMAHAYAMAHA BMWBMW

TRAVELTRAVEL

VARADEROVARADERO

125cc125cc 1000cc1000cc

ON-OFFON-OFF

200cc200cc

SERROWSERROW

TRAVELTRAVEL

650cc650cc

F650F650F650GSF650GS

YAMAHAYAMAHA BMWBMW

ON-OFFON-OFF

200cc200cc

SERROWSERROW

TRAVELTRAVEL

F650GSF650GS

650cc650cc

NJNJ

R (oot)

C (ountry)

B (rand)

T (ype)

L (ocation)

M (odel)

E (ngine)

R

C

B

T M

L

E

DIMENSIONS

dimension graph = summary of the tree structure

Page 11: presentation(ppt)

Stefanos Souldatos - HDMS 2006 13

Partially Specified Tree-pattern QueryR

C

B

T M

L

E

C = {Greece}

B = {BMW}

M = ?

B = {BMW}

E = ?

R (oot)

C (ountry)

B (rand)

T (ype)

L (ocation)

M (odel)

E (ngine)

DIMENSIONS

Query: Find shops with spare parts for all models and all engines of BMW motorbikes in Greece. (+ structural info)

Page 12: presentation(ppt)

Stefanos Souldatos - HDMS 2006 14

Partially Specified Tree-pattern Query

Query: Find shops with spare parts for all models and all engines of BMW motorbikes in Greece. (+ structural info)

R

C

B

T M

L

E

partially specified paths (PSP)

C = {Greece}

B = {BMW}

M = ?

B = {BMW}

E = ?

PSP *p2PSP p1R (oot)

C (ountry)

B (rand)

T (ype)

L (ocation)

M (odel)

E (ngine)

DIMENSIONS

Page 13: presentation(ppt)

Stefanos Souldatos - HDMS 2006 15

Partially Specified Tree-pattern QueryR

C

B

T M

L

E

output path (*)

partially specified paths (PSP)

C = {Greece}

B = {BMW}

M = ?

B = {BMW}

E = ?

PSP *p2PSP p1R (oot)

C (ountry)

B (rand)

T (ype)

L (ocation)

M (odel)

E (ngine)

DIMENSIONS

Query: Find shops with spare parts for all models and all engines of BMW motorbikes in Greece. (+ structural info)

Page 14: presentation(ppt)

Stefanos Souldatos - HDMS 2006 16

Partially Specified Tree-pattern Query

parentchild

R

C

B

T M

L

E

output path (*)

partially specified paths (PSP)

ancestordescendant

C = {Greece}

B = {BMW}

M = ?

B = {BMW}

E = ?

PSP *p2PSP p1R (oot)

C (ountry)

B (rand)

T (ype)

L (ocation)

M (odel)

E (ngine)

DIMENSIONS

Query: Find shops with spare parts for all models and all engines of BMW motorbikes in Greece. (+ structural info)

Page 15: presentation(ppt)

Stefanos Souldatos - HDMS 2006 17

Partially Specified Tree-pattern Query

parentchild

R

C

B

T M

L

E

node sharing expression

(NSE)

output path (*)

partially specified paths (PSP)

ancestordescendant

C = {Greece}

B = {BMW}

M = ?

B = {BMW}

E = ?

PSP *p2PSP p1R (oot)

C (ountry)

B (rand)

T (ype)

L (ocation)

M (odel)

E (ngine)

DIMENSIONS

Query: Find shops with spare parts for all models and all engines of BMW motorbikes in Greece. (+ structural info)

Page 16: presentation(ppt)

IntroductionData Model

Additional ConceptsQuery Containment

ExperimentsConclusion

Page 17: presentation(ppt)

Stefanos Souldatos - HDMS 2006 19

Additional Concepts

C = {Greece}

B = {BMW}

M = ?

B = {BMW}

E = ?

PSP *p2PSP p1

C = {Greece}

Full Form Query

Page 18: presentation(ppt)

Stefanos Souldatos - HDMS 2006 20

Additional ConceptsR

C

B

T M

L

E

RC = {Greece}

B = {BMW}

T

ME

C = {Greece}

B = {BMW}

M = ?

B = {BMW}

E = ?

PSP *p2PSP p1

C = {Greece}

Full Form Query

Dimension Trees

DIMENSION TREES = QUERY + GRAPH

Page 19: presentation(ppt)

IntroductionData Model

Additional ConceptsQuery Containment

ExperimentsConclusion

Page 20: presentation(ppt)

Stefanos Souldatos - HDMS 2006 22

Absolute Containment

Q1 Q2 Each result of Q1 is a result of Q2.

Page 21: presentation(ppt)

Stefanos Souldatos - HDMS 2006 23

Absolute Containment

Q1 Q2 Each result of Q1 is a result of Q2.

homomorphism from Q2 to Q1

Page 22: presentation(ppt)

Stefanos Souldatos - HDMS 2006 24

Absolute Containment

Q1 Q2 Each result of Q1 is a result of Q2.

Q1 Q2

homomorphism from Q2 to Q1

PSP p2PSP *p1

C

B

M

B

E

C

PSP p4PSP *p3

C

ME

C

Page 23: presentation(ppt)

Stefanos Souldatos - HDMS 2006 25

Relative Containment (w.r.t. G)

Q1 G Q2 Each result of Q1 in G is a result of Q2 in G.

Page 24: presentation(ppt)

Stefanos Souldatos - HDMS 2006 26

Relative Containment (w.r.t. G)

Q1 G Q2 Each result of Q1 in G is a result of Q2 in G.

homomorphism from the Dimension Trees of Q2 to the Dimension Trees of Q1

Page 25: presentation(ppt)

Stefanos Souldatos - HDMS 2006 27

Relative Containment (w.r.t. G)

Q1 G Q2 Each result of Q1 in G is a result of Q2 in G.

R

C

B

T

EM

A dimension tree of Q1

A dimension tree of Q2

R

C

B

T

E

homomorphism from the Dimension Trees of Q2 to the Dimension Trees of Q1

Page 26: presentation(ppt)

Stefanos Souldatos - HDMS 2006 28

Relative Containment Heuristic

1msec

Absolute Containment

(AC)

100msec

Relative Containment

(RC)

Page 27: presentation(ppt)

Stefanos Souldatos - HDMS 2006 29

Relative Containment Heuristic

1msec

Absolute Containment

(AC)

100msec

Relative Containment

(RC)

Relative Containment

Heuristic (RCH)

sound but not complete

extract structural information from the Dimension Graph insert it in the query Q1 check Q1 Q2 instead of Q1 G Q2

Page 28: presentation(ppt)

Stefanos Souldatos - HDMS 2006 30

Relative Containment Heuristic

B = ?

T = ?

PSP *p1

B = ?

PSP *p2

C = ?

Q1 Q2

R

C

B

T M

L

E

Q1 Q2

Example

Page 29: presentation(ppt)

Stefanos Souldatos - HDMS 2006 31

Relative Containment Heuristic

B = ?

T = ?

PSP *p1

B = ?

PSP *p2

C = ?

Q1 Q2

R

C

B

T M

L

E

Q1 Q2

Example

B=>T : R->C, C=>B

Page 30: presentation(ppt)

Stefanos Souldatos - HDMS 2006 32

Relative Containment Heuristic

B = ?

T = ?

PSP *p1

B = ?

PSP *p2

C = ?

Q1 Q2

R

C

B

T M

L

E

Q1 Q2

C = ?

R = ?

Q1 G Q2

Example

B=>T : R->C, C=>B

Page 31: presentation(ppt)

IntroductionData Model

Additional ConceptsQuery Containment

ExperimentsConclusion

Page 32: presentation(ppt)

Stefanos Souldatos - HDMS 2006 34

Experiments We measured…

execution time for Absolute Containment (AC) Relative Containment (RC) Relative Containment Heuristic (RCH)

accuracy for RCH

…for various graph sizes …for various query sizes

Page 33: presentation(ppt)

Stefanos Souldatos - HDMS 2006 35

TimeT

ime

(mse

c)

Graph paths: 10 - 80

Graph dimensions: 20 Graph dimensions: 30 Graph dimensions: 40

Graph paths: 15 - 120 Graph paths: 20 - 160

Query PSPs: 1 Query PSPs: 2

Tim

e (m

sec)

Nodes per PSP: 3 - 6 Nodes per PSP: 3 - 6

RC

RCH

AC

RC

RCH

AC

RCH

AC

RC

RCH

AC

RCH

AC

RC

RC

Page 34: presentation(ppt)

Stefanos Souldatos - HDMS 2006 36

Accuracy of RCH 80% for graphs of common sizes

based on XML benchmarks (XMach, XMark, etc.)

50% for graphs of higher density

Page 35: presentation(ppt)

IntroductionData Model

Additional ConceptsQuery Containment

ExperimentsConclusion

Page 36: presentation(ppt)

Stefanos Souldatos - HDMS 2006 38

Conclusion Query Containment for Partially Specified Tree-

Pattern Queries (PSTPQs).

Sound technique for checking Relative Query Containment Time: one order of magnitude Accuracy: over 80%

Page 37: presentation(ppt)

Stefanos Souldatos - HDMS 2006 39

Future Work Heuristics for checking Relative Containment

precomputed and on-the-fly trade-off between time and accuracy

Special forms of queries, e.g. swings:

B

PSP *p3PSP p1

B

A A

C C

PSP p2

Page 38: presentation(ppt)

Questions?

Page 39: presentation(ppt)

Stefanos Souldatos - HDMS 2006 41

Links

Introduction (2-9)

Data Model (10-17)

Additional Concepts (18-20)

Query Containment (21-32)

Experiments (33-36)

Conclusion (37-41)

Appendix (42-46)

Page 40: presentation(ppt)

Appendix

Page 41: presentation(ppt)

Stefanos Souldatos - HDMS 2006 43

Who defines the dimensions? Automatic

XML tags (dimension graph = “path summary”, “path index”, “structural summary”)

Semi-automatic Graph administrator + XML tags

(dimension = group of XML tags) Graph administrator + ontology

Manual Graph administrator

Page 42: presentation(ppt)

Stefanos Souldatos - HDMS 2006 44

Inference RulesR

C

B

T M

L

E

C = {Greece}

B = {BMW}

M = ?

B = {BMW}

E = ?

PSP *p2PSP p1

C = {Greece}

1. Full Form Query

INFERENCE RULES(IR1) |- R[p1] R[p2](IR2) A[p1] A[p2], A[p2] A[p3] |- A[p1] A[p3](IR3) a structural expression that involves A[p] |- R[p] => A[p](IR4) A[p] B[p] |- A[p] => B[p](IR5) A[p] => B[p], B[p] => C[p] |- A[p] => C[p](IR6) A[p] B[p], A[p => C[p] |- B[p] => C[p](IR7) A[p] B[p], C[p] => B[p] |- C[p] => A[p](IR8) A[p1] B[p1], B[p1] B[p2] |- A[p2] B[p2](IR9) A[p1] => B[p1], B[p1] B[p2] |- A[p2] => B[p2](IR10) A[p1] => B[p1], A[p1] A[p2], R[p2] => B[p2] |- A[p2] => B[p2](IR11) A[p1] => B[p1], B[p1] B[p2] |- A[p1] A[p2](IR12) A[p1] B[p1], C[p2] B[p2], D[p1] D[p2] |- D[p1] => A[p1](IR13) A[p1] B[p1], A[p2] C[p2], D[p1] D[p2] |- D[p1] => A[p1](IR14) A[p1] => B[p1], B[p2] => A[p2], C[p1] C[p2] |- C[p1] => A[p1]

Page 43: presentation(ppt)

Stefanos Souldatos - HDMS 2006 45

Dimension TreesR

C

B

T M

L

E

RC = {Greece}

B = {BMW}

T

M

E

r/Greece/BMW/

*T[*E]/*M

RC = {Greece}

B = {BMW}

T

ME

r/Greece/BMW/*T/*M [*E]

RC = {Greece}

B = {BMW}

T

E

M

RC = {Greece}

B = {BMW}

T

M

E

E

M

r/Greece/BMW/*T[*M/*E]/*E*M

r/Greece/BMW/*T/*E/*M

C = {Greece}

B = {BMW}

M = ?

B = {BMW}

E = ?

PSP *p2PSP p1

C = {Greece}

Page 44: presentation(ppt)

Stefanos Souldatos - HDMS 2006 46

Previous Approaches Keyword-based search approach

Absence of structure Naive approach

All possible query patterns are generated

(Honda=>Greece, Greece=>Honda) Approximation techniques

Relax the query more answers Traditional integration approach

Global structure and mapping rules