on the minimization of xpath queries
DESCRIPTION
On the Minimization of XPath Queries. Paper by S. Flesca, F. Furfaro, E. Masciari CmpE 521 Presentation Emre Yurtsever – 2002701372 Muammer Yüzügüldü – 2003700183. Outline. Introduction Trees & Tree Patterns Problem Statement A Framework for minimizing X P ath queries - PowerPoint PPT PresentationTRANSCRIPT
On the Minimization of XPath Queries
Paper byS. Flesca, F. Furfaro, E. Masciari
CmpE 521 Presentation
Emre Yurtsever – 2002701372Muammer Yüzügüldü – 2003700183
2
Outline Introduction Trees & Tree Patterns Problem Statement A Framework for minimizing XPath
queries Complexity Results Tractability Results Conclusions & Future Works
3
Introduction XML Queries are usually expressed
by means of XPath expressions.
XPath expressions.
A way of navigating an XML tree to return the set of nodes through the paths specified by the expression.
4
Introduction – cont’d
An XPath expression can be represented graphically as a tree pattern.
<author>
<genre> <editor><title>
<authors>
<author>
<genre> <editor><title>
<bib>
<book><book>
An XML Tree...
5
Introduction – cont’d
For example;“find the titles of all the books for which at least one author is known.”
XPath Expression:bib/book[//author]/title
A tree pattern
<bib>
<book>
<author> <title>
Output Node
Descendant Edge
6
Introduction – cont’d
Efficiency of XPath expression depends on size.
Optimization ~ Minimization
We should minimize the expression.
7
Introduction – cont’d
Example Query : “Retrieve the editors that published thrillers
and whose authors have written a thriller.”
“query containment”.
Reduced Query : “Retrieve the editors that published thriller.”
8
Introduction – cont’d
Minimization problem for XPath fragments can be efficiently solved as:
1. It can be reduced to solve a number of instances of containment between pairs of tree patterns.
2. For these fragments, it can be reduced to find a homomorphism between them.
9
Trees & Tree Patterns
A tree t is a tuple (rt, Nt, Et, t) where; Nt ℕ, set of nodes.
t : Nt is a node labelling function.
rt Nt is the distinguished root of t.
Et Nt x Nt, set of edges.
10
Trees & Tree Patterns – cont’d
Given a tree t = (rt, Nt, Et, t)
Tree t’ = (rt’, Nt’, Et’, t’) is the subtree if : Nt’ Nt;
The edge (ni, nj) belongs to Et’ iff ni Nt’,
nj Nt’ and (ni, nj) Et.
11
Trees & Tree Patterns – cont’d
Definition : A tree pattern p is a pair (tp, op), where:
• tp = (rp, Np, Ep, p) is a tree.
• Ep is partitioned into the two disjoint sets Cp and Dp denoting, respectively, the child and descendent branches;
• op Np is a distinguished output node.
12
Trees & Tree Patterns – cont’d
Grammar for XPath expressions : exp exp | exp/exp | exp//exp | exp[exp] | | * | .
where is a symbol in , and the symbol ‘.’ stands for the current node.
Given XPath expression; a[b/*//c]//d
a
b d
c
*
13
Trees & Tree Patterns – cont’d
Given a tree t and a tree pattern p, an embedding e of p into t is a total function e : Np Nt, such that:
1. e(rp) = rt,
2. (x; y) Cp, e(y) is a child of e(x) in t,
3. (x; y) Dp, e(y) is a descendant of e(x) in t, and
4. x Np, if p(x) = a (where a *) then t(e(x)) = a.
14
Trees & Tree Patterns – cont’d
Models and Canonical Models of Tree Patterns The models of a tree pattern p defined over
the alphabet are the trees of T which can be embedded by p. The set of models of p is Mod(p) = {t T | p(t) }
Canonical models of a tree pattern p are models having the same shape as p. That is, a canonical
15
Trees & Tree Patterns – cont’d
Model and Canonical Model of a tree pattern
16
Trees & Tree Patterns – cont’d
Given two tree patterns p1, p2, we say that p1 is contained in p2 (p1 p2) iff t p1(t) p2(t).
We say that p1 and p2 are equivalent (p1 p2) iff p1 p2 and p2 p1 (i.e. t p1(t) = p2(t)).
The set of patterns which are equivalent to a given pattern p will be denoted as Eq(p).
17
Trees & Tree Patterns – cont’d
Notations on tree patterns.
A pattern p and its subpatterns spb, spd, spa
18
Trees & Tree Patterns – cont’d
Tree pattern p whose root has 2 children
Subpattern examples
19
Problem Statement
“Given a tree pattern p, construct a tree pattern pmin which is equivalent to p and having minimum size (i.e. size(pmin) = minsize(p))”
20
Problem Statement – cont’d
1. a minimum size tree pattern equivalent to p can be found among the subpatterns of p;
2. the containment between two tree patterns p, q (p q) is equivalent to the problem of finding a homomorphism from q to p. A homomorphism h from a pattern q to a pattern p is a total mapping from the nodes of q to the nodes of p such that:
h preserves node types (i.e. u Nq q(u) `*' ) q(u)
= p(h(u))); h preserves structural relationships (i.e.whenever v is
a child (resp. descendant) of u in q, h(v) is a child (resp. descendant) of h(u) in p).
21
Problem Statement – cont’d
A homomorphism between two tree patterns
22
Problem Statement – cont’d
Two tree patterns not related with homomorphism
23
A framework for minimizing XPath Queries
Two fundamental contribution Proving that property 1 holds for
XP{/, //, [], *}
An algorithm for minimizing a tree pattern query
24
Proving that Property holds for XP{/, //, [], *}
Various lemmas are introduced
Lemma 1 : Let p and q be two patterns with root r, such that p contained in q. Then, for each subpattern Qj element of P(q) there exists a subpattern Pi element of P(p) s.t Pi contained in Qi.
25
Example for Lemma 1
26
Proving that Property holds for XP{/, //, [], *}
Lemma 2 : Let p and be two patterns rooted in r s.t p=q and let m and n, with m>n, be the number of children of r in p and, respectively, q. Then, there exist a set S subset of SP(p) consisting of m-n subpatterns spi, such that p-S = p
27
Example for Lemma 2
28
Proving that Property holds for XP{/, //, [], *} Lemma 3 : Let p and q be two
equivalent patterns rooted in r having the same number of child and descendant nodes of r, and let q be of minimum size. Then, there not exists a subpattern spk element of SP(p) such that p – spk = p
29
Proving that Property holds for XP{/, //, [], *} Lemma 4 : Let p and q be two
eqivalent patterns whose roots have the same number of child and descendant nodes, and let q be of minimum size. For each subpattern Pi element of P(p) there exists a unique subpattern Qj element of P(q) directly connected to rq s.t piqj
30
Proving that Property holds for XP{/, //, [], *} Lemma 5 : A pattern p in XP {[], /,
//, *} is not of minimum size iff at least one of the following conditions hold:
1. there exists a pair of subpatterns pi, pj s.t pi contained in pj;
2. there exists a subpattern pi of p which is not of minimum size.
31
Proving that Property holds for XP{/, //, [], *} Theorem 1 : Given a pattern p in
XP{/, //, [], *} if minsize(p) = k then there exists a subpattern pmin of p such that p= pmin and size(pmin)=k
32
An Algorithm for tree pattern minimization Function Minimize(p:a tree pattern):pmin a minimum tree pattern
equivalent to pBegin
pmin = p;For each pi element of P(pmin) do
if (pmin -spi contained in pmin) pmin = pmin – spi;
SPnew = 0;For each spi element of SP(pmin) do
SPnew = SPnew + Minimize(spi);pmin = assemble (pmin, SPnew);return pmin;
End
33
Upper Bound Algorithm 1 works in O(b*r*(|p|
^2)*((w+1)^(d+1))) |p| is the size of p d is the number descendant edges in p w is one the longest chain of ‘*’ in p b is the number branches of p as b r is the maximum degree of any node of p
34
Complexity Results
In XPath{/, //, [], *} it is not possible to define an algorithm performing much better than Algorithm 1
Lemmma 6: Let p be a pattern in XP{/, //, [], *} and k is possitive integer. The problem of testing if minsize(p)>k is NP-complete problem
35
Complexity Results
Theorem 2 : Let p be a pattern in XP{/, //, [], *} and k a positive integer. The problem of testing if there exists a pattern p’ equivalent to p such that size(p’)<= k is coN-complete
36
Tractability Results Definition : A limited branched tree
pattern p is a tree pattern in XP{/, //, [], *} such that:
1. Every non leaf node of p may have any number of children;
2. If a node n has k children n1...nk, then at least k-1 of the patterns spn, (where i element [1...k]) are linear.
37
Example
b
a
b d
b
a
c
*
*
r
b
38
Tractability Results Theorem : Let p a limited branched tree
pattern. A minimum pattern pmin equivalent to p can be found in polynomial time. (w.r.t. The size of p)
Linear patterns have minimum size The containment between pairs of linear
patterns can be decided in polynomial time.
39
Tractability Results Algorithm 2Function Minimize(p:a boundend branched tree pattern):pmin a minimum tree pattern equivalent to pBegin
pmin = p;B = {b1, ...., bm};while(B != 0)
b = deepest(B);q = spb;Redq = 0;For each pi element of P(q) do
For each qj of P(q) doif ((i!=j) ^ (qi is linear) ^ (qj not element of Redq) ^(qj
contained in qi))Redq = Redq + qi;
q = q – Redq;pmin = replace(pmin, sqb, q);
end whilereturn pmin;
End
40
Conclusion It has been proved the global minimality
property, a minimum tree pattern equivalent to a given tree pattern p can be found amoung the subpatterns of p, and thus obtained by prunning “redundant” branches from p.
It has been characerized the complexity of the minimization problem, showing that the corresponding decisional problem is coNP-complete.
41
Conclusion It has been studied a “tractable”
form of tree pattern which can be minimized in polynomial time.
It has been provided by some algorithms proposed in the paper.
42
Future Works Extending minimization framework
to deal with XPath queries that must satisfy some constraints such as join conditions on tree pattern nodes.
The introduction of these constraints makes the minimization problem harder, and global minimality property does not hold.
43
Questions?
Thank you...