a study of a positive fragment of path queries: expressiveness, normal form and minimization

46
Yuqing Wu, Dirk Van Gucht Indiana University Marc Gyssens Hasselt University Jan Paredaens University of Antwerp A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Upload: gezana

Post on 22-Mar-2016

21 views

Category:

Documents


0 download

DESCRIPTION

A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization. Yuqing Wu, Dirk Van Gucht Indiana University Marc Gyssens Hasselt University Jan ParedaensUniversity of Antwerp. TexPoint fonts used in EMF. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Yuqing Wu, Dirk Van Gucht Indiana UniversityMarc Gyssens Hasselt UniversityJan Paredaens University of Antwerp

A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form

and Minimization

Page 2: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Research in XMLXML data modelXML query languages

XPathXQuery……

XML data repositories Support from DB vendorsLORE, Niagara, TIMBER……

2

Page 3: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Research in XMLCharacteristics of XML query languages

XPath and fragmentsCharacteristics: expressiveness,

distinguishibility, complexity, …System design

Query processing and evaluationNew access methods: structural join

Integrity, security, ……

3

Page 4: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Theoretical Study System Design RDB did very well in this aspectOur work at Indiana University

Coupling the theoretical study of XML data and query language and the system design of XML search engines [ICDT-EROW 07]

Coupling the partition of XML documents induced by the structure of XML document with the partition induced by fragments of XPath algebras. [DBPL 07, IS 08]

Applying the coupling in the design of structural indices for XML [WebDB 08]

Designing workload sensitive structural indices for XML. [in submission]

4

Page 5: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

OutlineWhat we studiedEquivalences of query languagesNormal formResolution expressivenessEfficient query evaluationSummary and discussion

5

Page 6: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

OutlineWhat we studied

XML documentsPath+ algebraTree queries

Equivalences of query languagesNormal formResolution expressivenessEfficient query evaluationSummary and discussion

6

Page 7: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

XML DocumentsA labeled tree (V, Ed, l),

whereV is the set of nodesEd is the set of edgesl : VL is a node-

labeling function.

7

Page 8: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Querying XML Documentfor $i in doc(…)//a/b

for $j in $i/c/*/d[e] for $k in $j/*/f

return ($i, $k)intersectfor $i in doc(…)//a/b

for $j in $i/c/a/d for $k in $j/c/f

return ($i, $k)

8

Page 9: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Path+ Algebra – Path Semantics

¼ (1)

})(|),{()(ˆ)(

}|),{()(

lmVmmmDl

DVmmmD

l

1)(

)(

EdD

EdD

)()()(

))()(()(;)}(),(:|),{()()}(),(:|),{()(

2121

21324,121

2

1

DEDEDEE

DEDEDEEDEnmmnnDEDEnmnmmDE

9

Page 10: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Path+ Expression – An Example

¼ (1)

;ˆ;;ˆ;ˆ;;);))ˆ;;(

);ˆ;(((;;)ˆ;;ˆ(;)ˆ;;ˆ(;)(

1

12221

dccc

acacdE

E(D) = {(n8,n11), (n8,n12)}

10

;ˆ;;ˆ;ˆ

;

;);))ˆ;;();ˆ;(((

;

;)ˆ;;ˆ(;)ˆ;;ˆ(;)(

1

12

221

dcc

ca

cacdE

Page 11: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Interesting Sub-languages

212121 |||;|||||: EEEEEElE

Path+ :

Path+(1, 2) :

DPath+(1) : EEElE 121 |;||||:

11

EEEElE 2121 ||;|||||:

Page 12: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Tree Query for XML A tree query T is a 3-

tuple (T, s, d), with T: a labeled tree –

nodes of T are either labeled with a symbol of L or with a wildcard *.

s and d: nodes of T, called the source and destination nodes.

12

Page 13: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

OutlineWhat we studiedEquivalences of query languages

Normal formResolution expressivenessEfficient query evaluationSummary and discussion

13

Page 14: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Equivalences of Query LanguagesTheorem

The query languages Path+ , T and Path+

(1, 2) are all equivalent in expressive power, and there exist translation algorithms between any two of them.

14

Page 15: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Path+ Expression Tree Query T

¼ (1)

})(|),{()(ˆ)(

}|),{()(

lmVmmmDl

DVmmmD

l

1)(

)(

EdD

EdD

)}(),(:|),{()(

)}(),(:|),{()(

2

1

DEnmmnnDEDEnmnmmDE

)()()(

))()(()(;

2121

21324,121

DEDEDEE

DEDEDEE

*

l*

*s

d

*

*d

s

15

Page 16: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Transformation of Composition

16 ))()(()(; 21324,121 DEDEDEE

Page 17: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Transformation of Intersection

E1 E2

17 )()()( 2121 DEDEDEE

Page 18: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Tree Query T Path+ Expression

Base cases: Empty tree(<{n},>, n,n)

(<{n1, n2},{(n1, n2)}>, n1, n2)

(<{n1, n2},{(n1, n2)}>, n2, n1)

s (d)

s d

d s

s (d)

l l̂

18

Page 19: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Tree Query T Path+ Expression

dpTTtoP ,,; 2

Recursive case #1: s is not an ancestor of d.

s has no child and l(s)=*

d is parent of s, d has no ancestor, no other child and l(d)=*

d s

p

T1

T2 d

s

dpTTtoPssTTtoP ,,;;,, 21

;,,1 ssTTtoP

19

Page 20: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Recursive case #2: s is not the root.

s has no child and l(s)=*

Tree Query T Path+ Expression

dsTTtoPsrTTtoP ,,;,, 122

d

s

r

T1

T2

s

r

d srTTtoP ,,22

20

Page 21: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Recursive case #3: s is a strict ancestor of d.

d has no child and l(d)=*

s is parent of d, s has no child other than d and l(d)=*

Tree Query T Path+ Expression

;,,2 psTTtoP

ddTTtoPpsTTtoP ,,;;,, 12

ddTTtoP ,,; 1

d

s

d

s

T1

T2

p

21

Page 22: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Recursive case #4: s = d is the root.

l(s)=*

Tree Query T Path+ Expression

)(;,,;;,, 1111 scsTTtoPcsTTtoP nn l

s, d…

T1

s, d

Tn

c1

cn

nn csTTtoP

csTTtoP

,,

;;,,

1

111

22

Page 23: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Equivalences of Query LanguagesTheorem

The query languages Path+ , T and Path+

(1, 2) are all equivalent in expressive power, and there exist translation algorithms between any two of them.

Path+ exp T query Path+(1, 2) exp

23

Page 24: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

OutlineWhat we studiedEquivalences of query languagesNormal formResolution expressivenessEfficient query evaluationSummary and discussion

24

Page 25: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Normal Form Observation about the tree query Path+(1,2)

transformation:The resultant Path+(1,2) expression is of the form

where m≥0 and n ≥0Ci (i = um ,…, u1, d1 ,…, dn) are of the formCtop is of the form

E is a DPath+(1) expression.

nm ddtopuu CCCCC 11

?*1 ]ˆ[)]([ lE?*

12 ]ˆ[)]()][([ lEE

EEElE 121 |;||ˆ||:

d

rt

s

25

Page 26: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Normal FormE(Tts) -1; E(Ttt) ; 2 (E(Trt)); E(Ttd)

E(Tts), E(Ttt),E(Trt),E(Ttd) areDPath+(1) expressions Tt

s Tt

d

Tr

t

Tt

t d

r

t

s

26

Page 27: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

OutlineWhat we studiedEquivalences of query languagesNormal formResolution expressivenessEfficient query evaluationSummary and discussion

27

Page 28: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Resolution ExpressivenessResolution expressiveness: a language’s

ability to distinguish a pairs of nodes of a pair of paths in the document.

28

Page 29: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Expression EquivalenceNodes m1 and m2 are expression-related

(m1 ≥exp m2), if for each expression E, E(D)(m1) implies E(D)(m2) , where E(D)(m) = {n | (m,n) E(D)}.

m1 =exp m2 if m1 ≥exp m2 and m2 ≥exp m1

29

Page 30: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

1-equivalence Nodes m1 and m2 are downward 1-related (m1 ≥

1 m2) iffl (m1) = l (m2);For each child n1 of m1, there exist a child n2 of m2 such that

n1 ≥1 n2.

Nodes m1 and m2 are 1-related (m1 ≥1 m2) iffm1 ≥

1 m2

if m1 is not the root and p1 is the parent of m1 , then m2 is not the root with parent p2 such that p1 ≥1 p2 .

m1 =1 m2 if m1 ≥1 m2 and m2 ≥1 m1.

(m1, n1) ≥1 (m2, n2) if m1 ≥1 m2 and n1 ≥1 n2 and sig(m1, n1) = sig(m2, n2) .

30

Page 31: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Resolution ExpressivenessTheorem :

m1 =exp m2 iff m1 =1 m2

Theorem : (m1,n1) E(D) implies (m2,n2) E(D) iff (m1,n1) ≥1 (m2,n2)

31

Page 32: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

OutlineWhat we studiedEquivalences of query languagesNormal formResolution expressivenessEfficient query evaluationSummary and discussion

32

Page 33: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Tree Query Minimization1st Reduction: merging 1-equivalent nodes

in a tree query;*

a a

* c

*

c

*dd

sd

c

*d

*

a a

* c

*

c

*dd

sd

33

Page 34: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Tree Query Minimization 1-*-related (≥*

1 ): relax 1-related with l (m1) + l (m2) = l (m2);

2nd Reduction: deleting from a tree query in a top-down fashion every node m1 for which there exists another node m2 such that m1 ≥*

1

m2 . *

a a

* c

*

c

*dd

sd

*

a

c

* d

sd

34

Page 35: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Efficient Query Evaluation Path+

Expression

E(Tts) -1; E(Ttt) ; 2 (E(Trt)); E(Ttd)E(Tts) , E(Ttt), E(Trt), E(Ttd)

are DPath+(1) expressionsTt

s Tt

d

Trt

Ttt d

r

t

s

Minimum Tree

Query

Tree Query

1st & 2nd Reduction

Normal Form

35

Page 36: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Efficient Query EvaluationExp = E(Tts) -1; E(Ttt) ; 2 (E(Trt)); E(Ttd)

E(Tts) , E(Ttt), E(Trt), E(Ttd) are DPath+

(1) expressionsDPath+(1) queries can be evaluated via

index-only plan using P(k)-Trie index. [Bre08]

[Bre08]: Sofia Brenes, Yuqing Wu, Dirk Van Gucht, Pablo Santa Cruz. Trie Indices for Efficient XML Query Evaluation. WebDB 2008.36

Page 37: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

OutlineWhat we studiedEquivalences of query languagesNormal formResolution expressivenessEfficient query evaluationSummary and discussion

37

Page 38: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Summary

212121 |||;|||ˆ||: EEEEEElE

Objects of study:XML document: a treePath+ language : Tree queries

Areas of study:ExpressivenessEquivalenceNormal formQuery evaluation

38

Page 39: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Extending the Path+ language

**

Adding operators:

Will the results hold? ExpressivenessEquivalenceNormal formQuery evaluation

39

Page 40: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Yuqing Wu, Dirk Van Gucht Indiana UniversityMarc Gyssens Hasselt UniversityJan Paredaens University of Antwerp

A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form

and Minimization

Thank you.

Questions? A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form

and Minimization

Page 41: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

41

SIGMOD/PODS 2010 Indianapolis, Indiana, USA

Conference date: Jun, 2010Deadlines: SIGMOD early Nov, 2009

PODS early Dec, 2009

Page 42: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

P[k]-Trie IndexA1

D1C2

B3B2C1

B4A2B1

C3

B5

C4

Keep track of the P[k]-partitions

Use the reverse label path as key P

[2](A)(B)

(C)

(D)

{(A1, A1), (A2, A2)}{(B1, B1), (B2, B2), (B3, B3), (B4, B4), (B5, B5)}{(C1, C1), (C2, C2), (C3, C3), (C4, C4)}{(D1, D1)}

(A,A)(A,B)(B,B)(B,C)(B,D)

{(A1, A2)}{(A1, B1), (A2, B2), (A2, B3), (A1, B4)}{(B4, B5)}{(B1, C1), (B2, C2), (B3, C3), (B5, C4)}{(B2, D1)}

(A,A,B)(A,B,B)(A,B,C)(A,B,D)(B,B,C)

{(A1, B2), (A1, B3)}{(A1, B5)}{(A1, C1), (A2, C2), (A2, C3)}{(A2, D1)} {(B4, C4)}

42

Page 43: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Query Evaluation with P[k]-Trie IndexQuery 1: //A/B/C

A1

D1C2

B3B2C1

B4A2B1

C3

B5

C4

43

Page 44: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Query Evaluation with P[k]-Trie IndexQuery 2: //B/C

A1

D1C2

B3B2C1

B4A2B1

C3

B5

C4

44

Page 45: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Query Evaluation with P[k]-Trie IndexQuery 3: //A/B[./D]/C A1

D1C2

B3B2C1

B4A2B1

C3

B5

C4

45

Page 46: A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

Query Evaluation with P[k]-Trie IndexQuery 3: //A/B[./D]/C A1

D1C2

B3B2C1

B4A2B1

C3

B5

C4

46