Download - A New Top-down Algorithm for Tree Inclusion
A New Top-down Algorithmfor Tree Inclusion
Dr. Yangjun Chen
Dept. Applied Computer Science,
University of Winnipeg
515 Portage Ave.
Winnipeg, Manitoba, Canada R3B 2E9
Outline
Motivation Basic algorithm for tree
inclusion problem- Definition- Algorithm description
Improvements Summary
Given two ordered labeled trees P and T, called the pattern and the target,respectively. An interesting problem is: Can we obtain pattern P by deletingsome nodes from target T? That is, is there a sequence v1 , ..., vk of nodessuch that for
T0 = T andTi+1 = delete(Ti, vi +1) for i = 0, ..., k - 1,
we have Tk = P. If this is the case, we say, P is included in T, T contains P,or say, T covers P.
Motivation
a
b d
e f
T:
c b de f
T:adelete(T, c)
Motivation
s
vp
v n adv
“reads”“book”
s
np vp
det n v np adv
“The” “student”“reads” det adj n
“the”“interesting” “book”
“again and again”
Linguistic analysis
Definition 1 Let F and G be labeled ordered forests. We define an ordered embedding (, G, F) as an injective function : V(G) V(F) such that for all nodes v, u V(G),i) label(v) = label((v)); (label preservation condition)ii) v is an ancestor of u iff (v) is an ancestor of (u);(ancestor condition)iii) v is to the left of u iff (v) is to the left of (u); (Sibling condition)
Tree inclusion algorithm Definition
a
b b
G:a
d b
e b
b
F:
Algorithm
Tree inclusion algorithm
1. Let T = <t; T1, ..., Tk> (k 1) be a tree and G = <P1 , ..., Pl>(l 1) be a forest. We handle G as a tree P = <pv; P1, ..., Pl>,where pv represents a virtual node, matching any node in T.
2. Consider a node in P with children v1, ..., vj. We use a pair <i, v>(i j) to represent an ordered forest containing the first i subtreesof v: <P[v1], ..., P[vi]>. Then, <j, pv> represents the first j treesin G.
P:
v1 vi vk
… …
v
<i, v>
Algorithm
Tree inclusion algorithm
3. In addition, h(v) represents the height of v in a tree; and (v)represents a link from v in P to the leaf node on the left-mostpath in P[v].
Let v’ be a leaf node in P. Wedenote by -1(v’) a set of nodesx such that for each v x (v) = v’.
-1(v3) = {v1, v2, v3}
v1
v5
v4
v2
v3
(v1)
(v2)
P:
The tree inclusion checking is done by calling two functions recursively:top-down(T, G),bottom-up(T’, G),
where T is a tree, and T’ and G are two forests.
Algorithm
Tree inclusion algorithm
Each of the two functions returns a pair <i, v> with v being pv or a node onthe left-most path in P1.
T = <t; T1, ..., Tk>
T’ = <T1’, ..., Tk’>
G = <P1, ..., PL>
Function: top-down(T, G)
Tree inclusion algorithm
Case 1: G = <P1>; or G = <P1, ..., Pl> (l > 1), but |T | |P1| + |P2|.
In this case, we try to find a pair <i, v> such that T contains the first isubtrees of v, where v = pv , or v -1(v’) and v’ is the leaf node on the
left-most path in P1.
T: G:
P1
pv
G:
……P1 P2
pv
|T| |P1| + |P2|.
T: t
t
Pl
p1
In top-down(T, G), two cases will be handled.
p1
Function: top-down(T, G)
Tree inclusion algorithm
i) If t is a leaf node, we will check whether label(t) = label((p1)), where p1
is the root of P1. If it is the case, return <1, parent of (p1)>.
Otherwise, return <0, parent of (p1)>.
T = <t; T1, ..., Tk>: G:
P1
pv
G:
……P1 P2
pv
|T | |P1| + |P2|.
t
t
T = <t; T1, ..., Tk>:
Pl
case 1:
Function: top-down(T, G)
Tree inclusion algorithm
ii) If |T| < |P1| or height(t) < height(p1), we will make a recursive call
top-down(T , <P11, ..., P1j>), where <P11, ..., P1j> be a forest of
the subtrees of p1. The return value of top-down(T , <P11, ..., P1j>)
is used as the return value of top-down(T, G)
|T | < |P1|G:
……
pv
p1
… …P11 P1jP1i
T: t
Pl
case 1:
Function: top-down(T, G)
Tree inclusion algorithm
iii) If |T| |P1| (but |T | |P1| + |P2|) and height(t) height(p1), two casesneed to be considered:
• label(t) = label(p1). Call bottom-up(<T1, ..., Tk>, <P11, ..., P1j>).
• label(t) label(p1). Call bottom-up(<T1, ..., Tk>, <P1>).
p1
… …P11 P1jP1i
t
… …T1 TkTi
label(t) = label(p1)
p1
… …P11 P1j
P1i
t
… …T1 TkTi
label(t) label(p1)
case 1:
In both sub-cases, assume that the return value is <i, v>. A further checkingneeds to be conducted:
Function: top-down(T, G)
Tree inclusion algorithm
• If label(t) = label(v) and i = the outdegree of v, the return value shouldbe <1, v’s parent>.
• Otherwise, the return value is the same as <i, v>.
T:t
P1:p1
vor label(t) label(v)
label(t) = label(v)
case 1:
Function: top-down(T, G)
Tree inclusion algorithm
Case 2: G = <P1, ..., Pl> (l > 1), and |T| > |P1| + |P2|. In this case, we
will call bottom-up(<T1, ..., Tk>, G). Assume that the return value is <i, v>.
The following checkings will be continually conducted.
Case 1: G = <P1>; or G = <P1, ..., Pl> (l > 1), but |T | |P1| + |P2|.
G:
……P1 P2
pv
|T | > |P1| + |P2|
Pl
T:
……T1 T2
t
Tk
Function: top-down(T, G)
Tree inclusion algorithm
iv) If v = p1’s parent, the return value is the same as <i, v>. v) If v p1’s parent, check whether label(t) = label(v)) and
i = the outdegree of v. If so, the return value will be changed to<1, v’s parent>. Otherwise, the return value remains <i, v>.
Case 2: G = <P1, ..., Pl> (l > 1), and |T | > |P1| + |P2|. In this case, we
will call bottom-up(<T1, ..., Tk>, G).
Assume that the return value is <i, v>. The following checkings will becontinually conducted.
G:
… …P1 P2
pv
v = p1’s parent = pv
……P1 P2
pv
v p1’s parent
vPi Pl Pl
Function: bottom-up(T’, G)
Tree inclusion algorithm
bottom-up(T’, G) is designed to handle the case that both T’ and G are
forests. Let T’ = <T1, ..., Tk> and G = <P1, ..., Pq>. In bottom-up(T’, G),
we will make a series of calls top-down(Tl, <Pjl, ..., Pq>), where l = 1, ..., k,
j1 = 0, and j1 j2 ... jh q (for some h k), controlled as follows.
… …
Pi
… …
TkT1 Ti P1 PqT2
…
top-down(Tl, <Pjl, ..., Pq>)
T’: G:
Function: bottom-up(T’, G)
Tree inclusion algorithm
1. Two index variables l, j are used to scan T1, ..., Tk and P1, ..., Pq,respectively.
2. Let <il, vl> be the return value of top-down(Tl, <Pj, ..., Pq>). If vl = pj’sparent, set j to be j + il - 1. Otherwise, j is not changed. Set l to be l + 1.Go to (2).
3. The loop terminates when all Tl’s or all Pj’s are examined.
bottom-up(T’, G) is designed to handle the case that both T’ and G are
forests. Let T’ = <T1, ..., Tk> and G = <P1, ..., Pq>. In bottom-up(T’, G),
we will make a series of calls top-down(Tl, <Pjl, ..., Pq>), where l = 1, ..., k,
j1 = 0, and j1 j2 ... jh q (for some h k), controlled as follows.
Function: bottom-up(T’, G)
Tree inclusion algorithm
• If j > 0 when the loop terminates, bottom-up(T’, G) returns<j, p1’s parent>.
… …
Pi
… …
TkT1 Ti P1 PqT2
…
Pj
Function: bottom-up(T’, G)
Tree inclusion algorithm
i) Let <i1, v1>, <i2, v2>, ..., <ik, vk> be the respective return values of
top-down(T1, <P1, ..., Pq>),
top-down(T2, <P1, ..., Pq>), ... ...
top-down(Tk, <P1, ..., Pq>).
Since j = 0, each vl -1(v’) (l = 1, ..., k).
• Otherwise, j = 0. In this case, we will continue to searching for a pair<i, v> such that T’ contains the first i subtrees of v, where v -1(v’) andv’ is the leaf node on the left-most path in P1, as described below.
• If j > 0 when the loop terminates, bottom-up(T’, G) returns<j, p1’s parent>.
P1
v1
v2
vk
…
ii) If each il = 0, return <0, ,>, where is considered to be a descendant ofany node in G. Otherwise, find the first vg with children w1, ..., wh such thatvg is not a descendant of any other vj, and ig > 0. Call
bottom-up(<Tg+1, ..., Tk>, <P[wig+1], ..., P[wh]>).
Function: bottom-up(T’, G)
Tree inclusion algorithm
i) Let <i1, v1>, ..., <ik, vk> be the return values of top-down(T1, <P1, ..., Pq>),..., top-down(Tk, <P1, ..., Pq>), respectively. Since j = 0, each vl -1(v’)(l = 1, ..., k).
• Let <x, y> be its return value. If y = vg, then the return value ofbottom-up(T’, G) is set to be <ig + x, vg>.
• Otherwise, the return value is <ig, vg>.
… …
Tg+1T1 TgT2
P1
v1
vg
vk
Tk
… …
ig
Further improvements
Tree inclusion algorithm
In the case j = 0:
Let <i1, v1>, ..., <ik, vk> be the return values of top-down(T1, <P1, ..., Pq>),..., top-down(Tk, <P1, ..., Pq>). We will find the first vg such that it is not adescendant of any other vj and ig > 0. Then,
bottom-up(<Tg+1, ..., Tk>, <P[wig+1], ..., P[wh]>).
is invoked. This shows that all the return values except <ig, vg> are not usedin the subsequent computation. Thus, the work for looking for such valuesshould be avoided.
… …
Tg+1T1 TgT2
P1
v1
vg
vk
Tk
… …
Let <ij, vj> be the return value of top-down(Tj, <P1, ..., Pq>) such that ij > 0 and vj is p1 or a
descendant of p1. Then, during the execution of top-down(Tj+1, <P1, ..., Pq>), once we have
detected that it can only produce a return value <ij+1, vj+1> with vj+1 being a descendant of vj, we
should stop the corresponding computation immediately since this return value will not be usedin the subsequent searching. For this purpose, we rearrange top-down(Tj+1, <P1, ..., Pq>) to
top-down(Tj+1, <P1, ..., Pq>, vj) with vj being used to transfer information, called a
controlling-node.
Further improvements
Tree inclusion algorithm
Assume that in the execution of top-down(Tj+1, <P1, ..., Pq>, vj), we have the followingfunction calls: top-down(Tj+1,1, <P1, ..., Pq>, u1) returns <a1, u1>,
top-down(Tj+1,2, <P1, ..., Pq>, u2) returns <a1, u2>,
With all uj’s being a proper descendant of vj. Then the bottom-up function call withsome ui as a controlling node should not be conducted.
… …
bottom-up(<Tj+1,i , ... >, <… …>, ui ).
Summary
• An efficient method for tree inclusion problem- O|T|min{DP, |leaves(P)|}) time and- O(|T| + |P|) spacewhere DP – the height of P, and
• Future work- adapt the algorithm to a data stream environment - adapt the algorithm to an indexingenvironment
leaves(P) - set of the leaf nodes of P.
Thank you.