a study of a positive fragment of path queries: expressiveness, normal form and minimization
DESCRIPTION
A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization. Yuqing Wu, Dirk Van Gucht Indiana University Marc Gyssens Hasselt University Jan ParedaensUniversity of Antwerp. TexPoint fonts used in EMF. - PowerPoint PPT PresentationTRANSCRIPT
Yuqing Wu, Dirk Van Gucht Indiana UniversityMarc Gyssens Hasselt UniversityJan Paredaens University of Antwerp
A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form
and Minimization
Research in XMLXML data modelXML query languages
XPathXQuery……
XML data repositories Support from DB vendorsLORE, Niagara, TIMBER……
2
Research in XMLCharacteristics of XML query languages
XPath and fragmentsCharacteristics: expressiveness,
distinguishibility, complexity, …System design
Query processing and evaluationNew access methods: structural join
Integrity, security, ……
3
Theoretical Study System Design RDB did very well in this aspectOur work at Indiana University
Coupling the theoretical study of XML data and query language and the system design of XML search engines [ICDT-EROW 07]
Coupling the partition of XML documents induced by the structure of XML document with the partition induced by fragments of XPath algebras. [DBPL 07, IS 08]
Applying the coupling in the design of structural indices for XML [WebDB 08]
Designing workload sensitive structural indices for XML. [in submission]
4
OutlineWhat we studiedEquivalences of query languagesNormal formResolution expressivenessEfficient query evaluationSummary and discussion
5
OutlineWhat we studied
XML documentsPath+ algebraTree queries
Equivalences of query languagesNormal formResolution expressivenessEfficient query evaluationSummary and discussion
6
XML DocumentsA labeled tree (V, Ed, l),
whereV is the set of nodesEd is the set of edgesl : VL is a node-
labeling function.
7
Querying XML Documentfor $i in doc(…)//a/b
for $j in $i/c/*/d[e] for $k in $j/*/f
return ($i, $k)intersectfor $i in doc(…)//a/b
for $j in $i/c/a/d for $k in $j/c/f
return ($i, $k)
8
Path+ Algebra – Path Semantics
¼ (1)
})(|),{()(ˆ)(
}|),{()(
lmVmmmDl
DVmmmD
l
1)(
)(
EdD
EdD
)()()(
))()(()(;)}(),(:|),{()()}(),(:|),{()(
2121
21324,121
2
1
DEDEDEE
DEDEDEEDEnmmnnDEDEnmnmmDE
9
Path+ Expression – An Example
¼ (1)
;ˆ;;ˆ;ˆ;;);))ˆ;;(
);ˆ;(((;;)ˆ;;ˆ(;)ˆ;;ˆ(;)(
1
12221
dccc
acacdE
E(D) = {(n8,n11), (n8,n12)}
10
;ˆ;;ˆ;ˆ
;
;);))ˆ;;();ˆ;(((
;
;)ˆ;;ˆ(;)ˆ;;ˆ(;)(
1
12
221
dcc
ca
cacdE
Interesting Sub-languages
212121 |||;|||||: EEEEEElE
Path+ :
Path+(1, 2) :
DPath+(1) : EEElE 121 |;||||:
11
EEEElE 2121 ||;|||||:
Tree Query for XML A tree query T is a 3-
tuple (T, s, d), with T: a labeled tree –
nodes of T are either labeled with a symbol of L or with a wildcard *.
s and d: nodes of T, called the source and destination nodes.
12
OutlineWhat we studiedEquivalences of query languages
Normal formResolution expressivenessEfficient query evaluationSummary and discussion
13
Equivalences of Query LanguagesTheorem
The query languages Path+ , T and Path+
(1, 2) are all equivalent in expressive power, and there exist translation algorithms between any two of them.
14
Path+ Expression Tree Query T
¼ (1)
})(|),{()(ˆ)(
}|),{()(
lmVmmmDl
DVmmmD
l
1)(
)(
EdD
EdD
)}(),(:|),{()(
)}(),(:|),{()(
2
1
DEnmmnnDEDEnmnmmDE
)()()(
))()(()(;
2121
21324,121
DEDEDEE
DEDEDEE
*
l*
*s
d
*
*d
s
15
Transformation of Composition
16 ))()(()(; 21324,121 DEDEDEE
Transformation of Intersection
E1 E2
17 )()()( 2121 DEDEDEE
Tree Query T Path+ Expression
Base cases: Empty tree(<{n},>, n,n)
(<{n1, n2},{(n1, n2)}>, n1, n2)
(<{n1, n2},{(n1, n2)}>, n2, n1)
s (d)
s d
d s
s (d)
l l̂
18
Tree Query T Path+ Expression
dpTTtoP ,,; 2
Recursive case #1: s is not an ancestor of d.
s has no child and l(s)=*
d is parent of s, d has no ancestor, no other child and l(d)=*
d s
p
T1
T2 d
s
dpTTtoPssTTtoP ,,;;,, 21
;,,1 ssTTtoP
19
Recursive case #2: s is not the root.
s has no child and l(s)=*
Tree Query T Path+ Expression
dsTTtoPsrTTtoP ,,;,, 122
d
s
r
T1
T2
s
r
d srTTtoP ,,22
20
Recursive case #3: s is a strict ancestor of d.
d has no child and l(d)=*
s is parent of d, s has no child other than d and l(d)=*
Tree Query T Path+ Expression
;,,2 psTTtoP
ddTTtoPpsTTtoP ,,;;,, 12
ddTTtoP ,,; 1
d
s
d
s
T1
T2
p
21
Recursive case #4: s = d is the root.
l(s)=*
Tree Query T Path+ Expression
)(;,,;;,, 1111 scsTTtoPcsTTtoP nn l
s, d…
T1
s, d
Tn
c1
cn
nn csTTtoP
csTTtoP
,,
;;,,
1
111
22
Equivalences of Query LanguagesTheorem
The query languages Path+ , T and Path+
(1, 2) are all equivalent in expressive power, and there exist translation algorithms between any two of them.
Path+ exp T query Path+(1, 2) exp
23
OutlineWhat we studiedEquivalences of query languagesNormal formResolution expressivenessEfficient query evaluationSummary and discussion
24
Normal Form Observation about the tree query Path+(1,2)
transformation:The resultant Path+(1,2) expression is of the form
where m≥0 and n ≥0Ci (i = um ,…, u1, d1 ,…, dn) are of the formCtop is of the form
E is a DPath+(1) expression.
nm ddtopuu CCCCC 11
?*1 ]ˆ[)]([ lE?*
12 ]ˆ[)]()][([ lEE
EEElE 121 |;||ˆ||:
d
rt
s
25
Normal FormE(Tts) -1; E(Ttt) ; 2 (E(Trt)); E(Ttd)
E(Tts), E(Ttt),E(Trt),E(Ttd) areDPath+(1) expressions Tt
s Tt
d
Tr
t
Tt
t d
r
t
s
26
OutlineWhat we studiedEquivalences of query languagesNormal formResolution expressivenessEfficient query evaluationSummary and discussion
27
Resolution ExpressivenessResolution expressiveness: a language’s
ability to distinguish a pairs of nodes of a pair of paths in the document.
28
Expression EquivalenceNodes m1 and m2 are expression-related
(m1 ≥exp m2), if for each expression E, E(D)(m1) implies E(D)(m2) , where E(D)(m) = {n | (m,n) E(D)}.
m1 =exp m2 if m1 ≥exp m2 and m2 ≥exp m1
29
1-equivalence Nodes m1 and m2 are downward 1-related (m1 ≥
1 m2) iffl (m1) = l (m2);For each child n1 of m1, there exist a child n2 of m2 such that
n1 ≥1 n2.
Nodes m1 and m2 are 1-related (m1 ≥1 m2) iffm1 ≥
1 m2
if m1 is not the root and p1 is the parent of m1 , then m2 is not the root with parent p2 such that p1 ≥1 p2 .
m1 =1 m2 if m1 ≥1 m2 and m2 ≥1 m1.
(m1, n1) ≥1 (m2, n2) if m1 ≥1 m2 and n1 ≥1 n2 and sig(m1, n1) = sig(m2, n2) .
30
Resolution ExpressivenessTheorem :
m1 =exp m2 iff m1 =1 m2
Theorem : (m1,n1) E(D) implies (m2,n2) E(D) iff (m1,n1) ≥1 (m2,n2)
31
OutlineWhat we studiedEquivalences of query languagesNormal formResolution expressivenessEfficient query evaluationSummary and discussion
32
Tree Query Minimization1st Reduction: merging 1-equivalent nodes
in a tree query;*
a a
* c
*
c
*dd
sd
c
*d
*
a a
* c
*
c
*dd
sd
33
Tree Query Minimization 1-*-related (≥*
1 ): relax 1-related with l (m1) + l (m2) = l (m2);
2nd Reduction: deleting from a tree query in a top-down fashion every node m1 for which there exists another node m2 such that m1 ≥*
1
m2 . *
a a
* c
*
c
*dd
sd
*
a
c
* d
sd
34
Efficient Query Evaluation Path+
Expression
E(Tts) -1; E(Ttt) ; 2 (E(Trt)); E(Ttd)E(Tts) , E(Ttt), E(Trt), E(Ttd)
are DPath+(1) expressionsTt
s Tt
d
Trt
Ttt d
r
t
s
Minimum Tree
Query
Tree Query
1st & 2nd Reduction
Normal Form
35
Efficient Query EvaluationExp = E(Tts) -1; E(Ttt) ; 2 (E(Trt)); E(Ttd)
E(Tts) , E(Ttt), E(Trt), E(Ttd) are DPath+
(1) expressionsDPath+(1) queries can be evaluated via
index-only plan using P(k)-Trie index. [Bre08]
[Bre08]: Sofia Brenes, Yuqing Wu, Dirk Van Gucht, Pablo Santa Cruz. Trie Indices for Efficient XML Query Evaluation. WebDB 2008.36
OutlineWhat we studiedEquivalences of query languagesNormal formResolution expressivenessEfficient query evaluationSummary and discussion
37
Summary
212121 |||;|||ˆ||: EEEEEElE
Objects of study:XML document: a treePath+ language : Tree queries
Areas of study:ExpressivenessEquivalenceNormal formQuery evaluation
38
Extending the Path+ language
**
Adding operators:
Will the results hold? ExpressivenessEquivalenceNormal formQuery evaluation
39
Yuqing Wu, Dirk Van Gucht Indiana UniversityMarc Gyssens Hasselt UniversityJan Paredaens University of Antwerp
A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form
and Minimization
Thank you.
Questions? A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form
and Minimization
41
SIGMOD/PODS 2010 Indianapolis, Indiana, USA
Conference date: Jun, 2010Deadlines: SIGMOD early Nov, 2009
PODS early Dec, 2009
P[k]-Trie IndexA1
D1C2
B3B2C1
B4A2B1
C3
B5
C4
Keep track of the P[k]-partitions
Use the reverse label path as key P
[2](A)(B)
(C)
(D)
{(A1, A1), (A2, A2)}{(B1, B1), (B2, B2), (B3, B3), (B4, B4), (B5, B5)}{(C1, C1), (C2, C2), (C3, C3), (C4, C4)}{(D1, D1)}
(A,A)(A,B)(B,B)(B,C)(B,D)
{(A1, A2)}{(A1, B1), (A2, B2), (A2, B3), (A1, B4)}{(B4, B5)}{(B1, C1), (B2, C2), (B3, C3), (B5, C4)}{(B2, D1)}
(A,A,B)(A,B,B)(A,B,C)(A,B,D)(B,B,C)
{(A1, B2), (A1, B3)}{(A1, B5)}{(A1, C1), (A2, C2), (A2, C3)}{(A2, D1)} {(B4, C4)}
42
Query Evaluation with P[k]-Trie IndexQuery 1: //A/B/C
A1
D1C2
B3B2C1
B4A2B1
C3
B5
C4
43
Query Evaluation with P[k]-Trie IndexQuery 2: //B/C
A1
D1C2
B3B2C1
B4A2B1
C3
B5
C4
44
Query Evaluation with P[k]-Trie IndexQuery 3: //A/B[./D]/C A1
D1C2
B3B2C1
B4A2B1
C3
B5
C4
45
Query Evaluation with P[k]-Trie IndexQuery 3: //A/B[./D]/C A1
D1C2
B3B2C1
B4A2B1
C3
B5
C4
46