from tree patterns to generalized tree patterns: on efficient evaluation of xquery
DESCRIPTION
From tree patterns to generalized tree patterns: On efficient evaluation of XQuery. Z.M. Chen, H.V. Jagadish, L.V.S. Lakshmanan, S. Paparizos (VLDB 2003) Fatih Gön 2002701366 Mehmet Şenvar 2003700221 Bogazici University Department of Computer Engineering. Overview. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/1.jpg)
1
From tree patterns to generalized tree patterns: On efficient evaluation of XQuery
Z.M. Chen, H.V. Jagadish, L.V.S. Lakshmanan, S. Paparizos
(VLDB 2003)
Fatih Gön 2002701366
Mehmet Şenvar 2003700221
Bogazici University Department of Computer Engineering
![Page 2: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/2.jpg)
2
Overview
Motivation: Current approach for XQuery evaluation is not efficient. Need a concise XQuery model as the basis to generate the efficient
evaluation physical plan
Main contribution: • Generalized Tree Patterns query model (GTP)• Algorithm translating from function-free XQuery to GTP • Physical algebra and algorithm translating from GTP to physical plan• Schema-aware optimization of GTP and physical plan
![Page 3: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/3.jpg)
3
Motivation
Current approaches
Navigational plan (NAV) : traverses down the path by recursively getting all children nodes and filter unwanted before next iteration
Baseline plan (BASE) : use TAX operator which take tree pattern and sequence of trees as input. Some tree patterns may be repeatedly evaluated.
Our approach
Generalized Tree Pattern (GTP) : use GTP as XQuery model to generated an efficient evaluation plan
![Page 4: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/4.jpg)
4
Tree pattern query
$p.tag = person &$s.tag = state &$l.tag = profile &$g.tag = age &$g.content > 25 &$s.content != ‘MI’
$p
$l$s
$g
$w
$p
$t
$p.tag = person &$w.tag = watches &$t.tag = watch
(a)
(b)
Boolean formula F
Boolean formula F
Tree T
Tree T
![Page 5: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/5.jpg)
5
Generalized tree pattern (GTP)
FOR $p IN document(“auction.xml”)//person, $l IN $p/profile
WHERE $l/age > 25 AND $p//state != ‘MI’
RETURN <result> {$p//watches/watch} {$l/interest} </result>
(a) An XQuery example
$p
$l$s
$g
$w
$t $i
(0)
(0)(0)
(0)(1)
(1)
(2)
$p.tag = person & $s.tag = state &$l.tag = profile & $i.tag = interest &$w.tag = watches & $t.tag = watch &$g.tag = age & $g.content > 25 &$s.content != ‘MI’
(b) Generalized tree pattern
![Page 6: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/6.jpg)
6
Generalized tree pattern (GTP)
GTP: A pair G=(T,F), where T is a tree and F is a boolean formula.• Each node of T is labeled by a distinct variable and has an
associated group number.• Each edge of T has a pair of associated labels <x,m>, where x
specifies the axis (pc or ad) and m specifies the edge status (mandatory or optional).
• F is a boolean combination of predicates applicable to nodes.
Group: each maximal set of nodes in a GTP connected to each other by paths not involving optional edges. By convention, group 0 include the GTP root.
![Page 7: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/7.jpg)
7
A pattern match of G into a collection of trees C is a partial mapping
h: GC such that:• h is defined on all group 0 nodes.• If h is defined on a node in a group, then it is necessarily defined on
all nodes in that group.• h preserves the structural relationships in G.• h satisfies the boolean formula F.
Pattern match (Formal Description)
![Page 8: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/8.jpg)
8
A pattern match is a mapping from the pattern nodes to nodes in an XML database such that the formula associated with the pattern as well as the structural relationships among pattern nodes.
Pattern match
![Page 9: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/9.jpg)
9
Universal GTP
Universal GTP is a GTP G=(T,F) such that some solid edges may be labeled ‘EVERY’.
‘SOME’ quantifier is already handled.
Eg. FOR $o IN document(“auction.xml”)//open_auction WHERE EVERY $b in $o/bidder SATISFIES $b/increase > 100 RETURN <result> {$o} </result>
$o
$b
$i
EVERY
(0)
(1)
(2)
F_L: pc($o,$b) & $b.tag = bidderF_R: pc($b,$i) & $i.tag = increase & $i.content > 100
$b: [F_L $i: (F_R)]
![Page 10: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/10.jpg)
10
Grammer for XQuery Fragment
Function-free XQuery captured by the following grammar
FLWR ::= ForClause LetClause WhereClause ReturnClause.
ForClause ::= FOR $fv1 IN E1, … , $fvn IN En.
LetClause ::= LET $lv1 := E1, … , $lvn := En.
ReturnClause ::= RETURN {E1} … {En}.
Ei ::= FLWR | XPATH.
WhereClause ::= WHERE (E1, … , En).
![Page 11: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/11.jpg)
11
Algorithm GTPInput: a FLWR expression Exp, a context group number gOutput: a GTP or GTPs with a join formula
if (g’s last level !=0) let g = g + “.0”;foreach (“For $fv in E”) do parse(E,g);let ng = g;foreach (“Let $lv := E”) do{ let ng = ng + 1; parse(E, ng);}foreach predicate p in WHERE do { if (p is “every El satisfies Er” ){ let ng = ng+1; parse (El, ng); F_L be the formula associated with the pattern result from El; let ng = ng+1; parse(Er,ng); F_R be the formula associated with the pattern result from Er; } else{ foreach Ei as p’s argument do parse(Ei, g); }}
foreach “{Ei}” do { let ng = ng + 1; parse (E, ng);}
Procedure parseInput: FLWR expression or XPath expression E, context group number gOutput: Part of GTP resulting from Eif (E is FLWR expression) GTP (E, g);else buildTPQ(E);end procedure
![Page 12: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/12.jpg)
12
Algorithm GTPInput: a FLWR expression Exp, a context group number g
Output: a GTP or GTPs with a join formula
The GTP can be informally understood as follows:
1)Find matches for all nodes connected to the root by only solid edges
2)Next, find matches to the remaining nodes (whose path to the GTP root involves one or more dotted edges), if they exist.
![Page 13: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/13.jpg)
13
Translating GTP Into an Evaluation Plan
• Avoid repeated matching of similar tree patterns
• Postpone the materilization of nodes as much as possible
• Operators and methods are avaliable in any XML database system
![Page 14: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/14.jpg)
14
Physical algebra
Index Scan ISp(S) : output each node satisfying the predicate p using an index for input trees S.
Filter Fp(S) : output only the trees satisfying the predicate p given trees S. Order is preserved.
Sort Sb(S) : Sort the input sequence of trees S based on the sorting basis b.
Value Join Jp(S1,S2) : a value-based comparison on the two input sequences of trees via the join predicate p. output sequence order is based on the left S1 input sequence order.
Structural Join SJr(S1,S2) : input tree sequences S1,S2 must be sorted based on the node id. Operator joins S1 and S2 based on the structural relationship r between them for each pair. Output is sorted by S1 or S2 as needed. Outer Structural Join (OSJ) where all S1 is included in the output. Semi structural Join (SSJ) where only S1 is retained in the output.
Group By Gb(S) : input is sorted on the grouping basis b. Group trees based on the grouping basis b.
Merge M(S1,…,Sn) : Sj’s are assumed to have the same cardinality k. For each i<=i<=k, merge tree i from each input under an artificial root and produce an output tree. Order is preserved.
![Page 15: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/15.jpg)
15
Translating GTP to Physical Plans
• Evaluation Algorithm
• Plan is a DAG where each node is a physical operator or input document
• Helper functions used findOrder(SJs, $n), getGroupBasis(g), getGroupEvalOrder(G)
![Page 16: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/16.jpg)
16
Stages of Evaluation Algorithm ( 7 steps)
1. Compute structural joins
2. Filter based on predicates depending on contents of more than 2 pattern nodes
3. Compute value joins
4. Compute aggregation
5. Filter based on predicates depending on aggr. value (if needed)
6. Compute value joins based on aggr. values (if needed)
7. Group return arguements (if any)
![Page 17: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/17.jpg)
17
Physical plan from the GTP
M
G G
S
SSJ
SJ
IS
IS
ISOSJ
SJ
OSJ
S
IS
FIS
SSJ
IS
F
ISS
state age
person profilecontent != ‘MI’ content > 25
person/profile
person/watches profile
watches
watch interest
watches/watch profile/interest
person, profile
person, profile person, profile
person, profile
RETURN
ARGUMENT #1
RETURN
ARGUMENT #2
F : filterIS : tag index scanSSJ : structural semi-joinSJ : strcutural joinOSJ : outer structural joinS : sortM : merge
person//state profile/age
![Page 18: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/18.jpg)
18
Schema-Aware Optimization
• Logical Optimization
- simplfy GTP by eliminating nodes using DTD or XML schema
• Phsysical Optimization
- eliminate duplicate operators (e.g. sorting, duplicate elimination)
![Page 19: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/19.jpg)
19
Schema-Aware Optimization
Internal node eliminationa//b//c a//c,
if schema implies every path from a to c passes through b.
a/b/c a//c?
$a
$c
$b
$a
$c
![Page 20: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/20.jpg)
20
Schema-Aware Optimization
Identifying two nodes with same tag
FOR $b IN …//book
WHERE $b/title = ‘DB’
RETURN <x> {$b/title} {$b/year} </x>
$b
$t2$t $y
$b
$t $y
$t2 can be eliminated,
if schema says every book has at most one title child
![Page 21: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/21.jpg)
21
Schema-Aware Optimization
Eliminate redundant leaves
FOR $a IN …./a[b]
RETURN {$a/c}
$a
$c$b
$a
$c
$b can be eliminated,
If schema implies every a has at least one b
![Page 22: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/22.jpg)
22
Schema-Aware Optimization
Elimination of sorting
SJ
person/profile
Provided two sorted input, the output will be in either person order or profile order. Not both in general.
However, if schema implies no person can have person descendants, output of the structural join ordered by person node id will also be in profile node id order.
person
person profile
profile
{p1 – l2, p2 – l1}
Not both in order!!!
“p1”
“p2” “l2”
“l1”
![Page 23: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/23.jpg)
23
Schema-Aware Optimization
Elimination of group-by
{$l/interest}
We must group the return argument results for the FOR variable in general.
However, if schema implies each profile has at most one interest subelement, then grouping on interest can be eliminated.
![Page 24: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/24.jpg)
24
Schema-Aware Optimization
Elimination of duplicate elimination
If schema implies watches cannot have watches descendants, the duplicate elimination is unnecessary.
$p//watches//watchwatches
watches watch
watch
“w1”
“w2”
“ws1”
“ws2”
ws1: {w1,w2}
ws2: {w2}
$p//watches/watch?
Note: 1. t can not have t descendants
2. A can only have one child B
![Page 25: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/25.jpg)
25
GTP Simplification
• Algorithm : pruneGTP(G)simplifies GTP based on child/descendant constraints
and avoidance constraints
• Steps (4)1. Detect emptiness of (sub)queries
2. Identify nodes with same tag
3. Eliminate reduntant leaves
4. Eliminate redundant internal nodes
![Page 26: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/26.jpg)
26
Theorem 1 (Optimality)
Let C : set of child/descendant constraints
Let G : GTP
There is a unique GTP Hmin equivalent to G under C, which has the smallest size among all equivalent GTPs.
GTP simplification algoritm will correctly simplfy G
to Hmin in polynomial time
![Page 27: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/27.jpg)
27
Experiments
• TIMBER native XML database• XMark generated documents• P-III 866 MHz• Windows 2000 professional• TIMBER had 100 MB buffer pool• 5 execution, eliminate max&min, get avr. • 479 MB XML document
![Page 28: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/28.jpg)
28
Navigational & Base Plans
I. NAV– Traverses recursively getting all children of a node checking
condition or name before next iteration
– Dependent on path size & number of children of each node
II. BASE– Straightforward tree pattern translation approach that utilizes
set-at-a-time processing– Unlike GTP does not make use tree pattern reuse
![Page 29: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/29.jpg)
29
Interesting Cases
• Parameters: path length, number of return arguements, query selectivity, data materilization cost
• GTP outperforms NAV and BASE for every query by a magnitude of 1 or 2
• All algorithms effected by path length, Nav is mostly
• Query selectivity, Number of return arguements does not effect all algoritms, NAV will do same iteration
• Data materilization cost affects both GTP and BASE, but not much NAV
![Page 30: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/30.jpg)
30
CPU Timings
![Page 31: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/31.jpg)
31
Scalability
• Used 24 MB, 47 MB, 239 MB, 479 MB, 2397 MB documents (Factor 1-5). Results:
• GTP scales linearly with size of database
![Page 32: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/32.jpg)
32
Schema-Aware Optimization Results
• In come case greatly enhance performance, but very little in others.• Well when data materilization is not the dominating cost.• Beneficial when path is of the form many/many/many and converted
to many//many.
![Page 33: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/33.jpg)
33
Related Work
• Navigation-based XQuery processing systems : Galax, Natix, Tamino, TIMBER
• No optimization and plan generation systems for XQueries for native systems as a whole
• GTP is 3-20 times faster than TIMBER system
• Resech is going on optimizing XPath expressions by using TPQs and schema knowledge
![Page 34: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/34.jpg)
34
Summary & Future Work
• A novel structure called GTP is proposed
• GTPs are used as a a basis for physical plan generation and query optimization
• Compared GTP with other methods with extensive set of tests and observed that GTP win by at least an order of magnitude.
• Presented an algorithm for schema-based simplification of GTP
• Evaluation of GTP on relational XML systems as well as native systems
![Page 35: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery](https://reader034.vdocuments.us/reader034/viewer/2022051419/56815881550346895dc5e3d3/html5/thumbnails/35.jpg)
35
Thanks...
Questions ?