unordered trees

18
Algorithmica (1996) 15:205-222 Algorithmica 9 1996 Springer-Verlag New York Inc. A C onstrained Edit Distance Between Unordered Labeled Trees 1 Kaizhong Zhang 2 Abstract. This paper considers the problem of computing a constrained edit distance between unordered labeled trees. The problem of approximate nnordered tree matching is also considered. We present dynamic programming algorithms solving these problems in sequential time O (IT ll • I T2 1 (deg(T1) + deg(T2)) log2(deg(Tl) q- deg(T2))). Our previous result shows that computing the edit distance between unordered labeled trees is NP-complete. Key Word s. Unordered trees, Constrained edit distance, Approximate tree match ing. 1. Introduction. Unordered rooted labeled trees are trees whose nodes are labeled and in which only ancestor relationships are significant (the left-to-right order among siblings is not significant). Such trees arise naturally in many fields: genealogical studies, e.g., determining genetic diseases based on ancestry tree patterns; computer vision, e.g., object representation and rec ognition [7], [8]; chemistry, e.g., studying the relationship betw een the structure of chemical compounds and their properties [ 10], [4]; file system, e.g., information retrieval in a unix file system. For many such applications, it would be useful to compare unordered labeled trees by some meaningful distance metric. The edit distance metric, used with success [6] for ordered labeled trees [17], [18], is one such natural metric. A previous result has shown that computing the edit distance for unordered labeled trees is NP-complete [19] (in fact MAX SNP-hard [16]). Since unordered trees are important in m any applications, one would like to find distances that can be efficiently computed. Motivated by these results, this paper considers a constrained edit distance metric between unordered labeled trees. The constraint we add is a natural one, namely, dis- joint subtrees be mapped to disjoint subtrees. Actually some applications would require this constraint (e.g., classification tree comparison) [11]. W e present a polynomial-time algorithm to compute this constrained edit distance metric. In Section 2 we review the previous NP-completeness and MAX SNP-hard results on the edit distance metric between unordered trees. In Section 3 we introduce the new distance metric between trees, which we call the constrained edit distance. In Section 4 we investigate the properties of the constrained edit distance metric. In Section 5 we present a dynamic programming algorithm to compute the constrained edit distance This research was supported by the Natural Sciences and Engineering Research Council of Canada under Grant No. OGP0046373. 2 Department of Computer Science, University of Western Ontario, London, Ontario, Canada, N6A 5B7. [email protected]. Receive d October 16, 1993; revised May 18, 1994. Communica ted by H. N. Gabow.

Upload: mengqiu-wang

Post on 10-Apr-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

8/8/2019 Unordered Trees

http://slidepdf.com/reader/full/unordered-trees 1/18

Algorithmica (1996) 15:205-222

Algorithmica9 1996Springer-VerlagNew YorkInc.

A C o n s t r a i n e d E d i t D i s t a n c e B e t w e e nU n o r d e r e d L a b e l e d T r ee s 1

K a i z h o n g Z h a n g 2

Abs trac t . This paper considers the problem of computing a constrained edit distance between unordered

labeled trees. The problem of approximate nnordered tree matching is also considered. We present dynamic

programming algorithms solving these problems in sequential time O (IT ll • IT21 • (deg(T1) + deg(T2)) •

l og2(deg (T l ) q - deg (T2) ) ) . Our previous result shows that computing the edit distance between unordered

labeled trees is NP-complete.

Key Words. Unordered trees, Constrained edit distance, Approximate tree matching.

1 . I n t r o d u c t i o n . U n o r d e r e d r o o t e d l a b e le d tr e e s a r e t re e s w h o s e n o d e s a r e l a b e le d

and i n w h i ch on l y ances t o r r e l a ti onsh i ps a r e s i gn i fi can t ( t he l e ft - to - r i gh t o rde r am ong

sibl ings i s no t s igni ficant ) . Su ch t rees ar i se na tura l ly in ma ny f ie lds : gene alogica l s tudies ,

e . g ., de t e rmi n i ng gene t i c d i s eases based on ances t ry t r ee pa t t e rns ; co mp ut e r v i s i on , e .g . ,

ob j ec t r ep resen t a t i on an d r ec ogn i t i on [7 ], [ 8 ] ; chemi s t ry , e . g ., s t udy i ng t he r e l a t ionsh i p

be t w een t he s t r uc tu r e o f chem i ca l com pou nds and t he i r p rope r t i e s [ 10 ], [ 4 ] ; fi le sys t em,

e . g ., i n fo rm a t i on r e t ri eva l in a un i x f i l e sys tem. F or ma ny such app l i ca ti ons , i t w ou l d b e

use fu l to com pare uno rde red l abe l ed tr ees by som e mean i ngfu l d i s tance me t r i c .

Th e ed i t d i s t ance me t r i c , u sed w i t h succes s [ 6] f o r o rde red l abe l ed t r ees [ 17 ] , [ 18 ], i s

one such na t u r a l me t r ic . A p rev i ous r e su l t has show n t ha t com put i ng t he ed i t d i s tance

fo r uno rde red l abe l ed t rees is N P-c om pl e t e [ 19] ( i n fac t M A X SN P-ha rd [16 ]) . S i nce

unord e red t r ees a r e i mpo r t an t in m any app l i ca t ions , on e w o u l d l i ke to f i nd d i st ances t ha t

can be e f f i c i en t ly com put ed .

Mot i va t ed by t hese r e su l t s , t h i s pape r cons i de r s a cons t r a i ned ed i t d i s t ance me t r i c

be t w een unorde red l abe l ed t r ees . The cons t r a i n t w e add i s a na t u r a l one , namel y , d i s -

j o i n t sub t r ees be m app ed t o d i s j o i n t sub tr ees . A c t ua l l y some ap p l i ca ti ons w ou l d r equ i r et h is co ns t r a i n t ( e .g . , c la s s i fi ca t ion t r ee com par i son) [ 11 ]. W e p resen t a po l yn om i a l - t i me

a l gor i t hm t o co mp ut e t h i s cons t r a i ned ed i t d is t ance me t r i c .

I n S e c t i o n 2 w e r e v i e w th e p r e v i o u s N P - c o m p l e t e n e s s a n d M A X S N P - h a r d r es u lt s

on t he ed i t d i s tance m e t r i c be t w een un orde re d t rees . In S ec t i on 3 w e i n t roduc e the new

di s t ance me t r i c be t w e en t r ees , w h i ch w e ca l l t he cons t r a i ned ed i t d is t ance . I n Sec t i on 4

w e i nves t i ga t e t he p rope r t i e s o f t he cons t r a i ned ed i t d i s t ance me t r i c . I n Sec t i on 5 w e

p r e s e n t a d y n a m i c p r o g r a m m i n g a l g o r i t h m t o c o m p u t e t h e c o n s t r a i n e d e d i t d i s t a n c e

This research was supported by the Natural Sciences and Engineering Research Council of Canada under

Grant No. OGP0046373.2 Department of Computer Science, University of Western Ontario, London, Ontario, Canada, N6A 5B7.

[email protected].

Received October 16, 1993; revised May 18, 1994. Communicated by H. N. Gabow.

8/8/2019 Unordered Trees

http://slidepdf.com/reader/full/unordered-trees 2/18

206 Kaizhong Zhang

m e t r i c a n d a n a l y s e t he c o m p l e x i ty . I n S e c ti o n 6 w e c o n s i d e r t h e a p p r o x i m a t e u n o r d e r e d

t r e e - m a t c h i n g p r o b l e m .

2 . P r e l i m i n a r i e s . I n t h is s ec t io n w e f i rs t i n t r o d u c e s o m e b a s i c de f i n it io n s a n d th e n

r e v i e w t h e p r e v i o u s r e s u l t s o n u n o r d e r e d t r e e s .

2.1 . Trees. A f re e t ree i s a c o n n e c t e d , a c y c l i c , u n d i r e c t e d g r a p h . A r oo t e d t r e e i s a

f r e e t re e i n w h i c h o n e o f th e v e r t i c e s is d i s t i n g u i s h e d f r o m t h e o t h e r s . T h e d i s t i n g u i s h e d

v e r t e x i s c a l l e d t h e r o o t o f t h e t r e e . W e r e f e r t o a v e r t e x o f a r o o t e d t r e e a s a n o d e o f t h e

t r e e . A n unor de r e d t r ee i s j u s t a r o o t e d t r e e . W e u s e t h e t e r m u n o r d e r e d t r e e t o d i s t i n g u i s h

i t f r o m t h e r o o t e d , o r d e r e d t r e e d e f i n e d b e l o w . A n or de r e d t r e e i s a r o o t e d t r e e i n w h i c h

t h e c h i l d r e n o f e a c h n o d e a r e o r d e r e d . T h a t i s, i f a n o d e h a s k c h i l d r e n , t h e n w e c a n

d e s i g n a t e t h e m a s th e f ir s t c h i l d , t h e s e c o n d c h i l d . . . . a n d t h e k t h c h i ld .

U n l e s s o t h e r w i s e s t a t e d , a l l t r e e s w e c o n s i d e r i n th e p a p e r a r e u n o r d e r e d l a b e l e d tr e e s .

2 . 2 . Edi t i ng O pe r a t i ons . W e c o n s i d e r th r e e k i n d s o f o p e ra t io n s . C h a n g i n g a n o d e n

m e a n s c h a n g i n g t h e la b e l o n n . D e l e t i ng a n o d e n m e a n s m a k i n g t h e c h i l d r e n o f n

b e c o m e t h e c h i ld r e n o f t h e p a r e n t o f n a n d t h e n r e m o v i n g n . I n s e r t i ng i s t he c o m p l e m e n t

o f d e l e t i n g . T h i s m e a n s t h a t i n s e r t i n g n a s t h e c h i l d o f m w i l l m a k e n t h e p a r e n t o f a

s u b s e t ( a s o p p o s e d t o a c o n s e c u t i v e s u b s e q u e n c e [ 1 7 ] ) o f th e c u r r e n t c h i l d r e n o f m .

S u p p o s e e a c h n o d e l a b e l i s a s y m b o l c h o s e n f r o m a n a l p h a b e t E . L e t )~, a u n i q u es y m b o l n o t in E , d e n o t e t h e n u l l s y m b o l . F o l l o w i n g [ 9] , [1 7 ] , a n d [ 1 9 ] , w e r e p r e s e n t a n

e d i t o p e r a t i o n a s a ~ b , w h e r e a i s e i t h e r )~ o r a l a b e l o f a n o d e i n t r e e 7"1 a n d b i s e i t h e r

)~ o r a l a b e l o f a n o d e i n t r e e T 2. W e c a l l a ~ b a c h a n g e o p e r a t i o n i f a r ~ a n d b r )~; a

d e l e t e o p e r a t i o n i f b = )~; a n d a n i n s e r t o p e r a t i o n i f a - - )~. L e t T 2 b e t h e t r e e t h a t r e s u l t s

f r o m t h e a p p l i c a t i o n o f a n e d i t o p e r a t i o n a -- + b t o t r e e T 1; t h i s is w r i t t e n T1 --+ T 2 v i a

a - + b .

L e t S b e a s e q u e n c e s l . . . . . s k o f e d i t o p e r a t i o n s . A n S - d e r i v a t i o n f r o m t r e e A t o t r e e

B i s a s e q u e n c e o f t r e e s A o . . . . . A k su c h th a t A = A o , B = A k , a n d A i - 1 ~ A i v i a si

f o r 1 < i < k . L e t F b e a c o s t f u n c t i o n w h i c h a s s i g n s to e a c h e d i t o p e r a t i o n a ~ b a

n o n n e g a t i v e r e a l n u m b e r y ( a ~ b ) .

W e c o n s t r a i n F t o b e a d i s t a n c e m e t r i c . T h a t i s:

( i ) ~, ( a - -+ b ) > 0 , y ( a ~ a ) = 0 .

( i i) y ( a - + b ) = ~ , ( b -- + a ) .

( i i i ) y (a - -+ c ) < ~ ' (a - -+ b ) + y (b - -+ c ) .

X-~IslW e e x t e n d F t o th e s e q u e n c e o f e d i t o p e r a t i o n s S b y l e t t i n g F ( S ) = ~-.~i= 1 F ( s i ) .

2 . 3 . E d i t i n g D i s t a n c e a n d E d i t D i s t a n c e M a p p i n g . T h e r e s u l t s i n t h is s u b s e c t i o n a r e

f r o m [ 19 ]. W e o m i t th e p r o o f s .

2 . 3 . 1 . E d i t i n g D i s t a n c e . T h e e d i t d i s ta n c e b e t w e e n t w o t r e e s i s d e f i n e d [ 1 9 ] b y c o n -

s i d e r i n g th e m i n i m u m c o s t e d i t o p e r a t i o n s s e q u e n c e t h a t t r a n s f o r m s o n e t r e e t o t h e o th e r .

8/8/2019 Unordered Trees

http://slidepdf.com/reader/full/unordered-trees 3/18

A C o n s t r a i n e d E d i t D i s t a n c e B e t w e e n U n o r d e r e d L a b e l e d T r e e s 2 0 7

F o r m a l l y t h e e d i t d i s t a n c e b e t w e e n T1 a n d T 2 i s d e f i n e d a s

De(T1 , T 2 ) = m ~ n { y ( S ) I S i s a n e d i t o p e r a t i o n s e q u e n c e t a k i n g 7"1 t o T 2 }.

2 . 3 . 2 . E d i t in g D i s ta n c e M a p p i n g s . T h e e d i t o p e ra t i o n s g i v e r i se t o a m a p p i n g w h i c h i s

a g r a p h i c a l s p e c i f ic a t io n o f w h a t e d i t o p e r a t i o n s a p p l y t o e a c h n o d e i n t h e t w o u n o r d e r e d

l a b e l e d t re e s .

G i v e n a tr e e , i t i s u s u a l l y c o n v e n i e n t t o u s e a n u m b e r i n g t o r e f e r t o t h e n o d e s o f th e

t re e . F o r a n o r d e r e d t r e e T , t h e l e f t - t o - r ig h t p o s t o r d e r n u m b e r i n g o r l e f t - t o - ri g h t p r e o r d e r

n u m b e r i n g a r e o f te n u s e d t o n u m b e r t h e n o d e s o f T f r o m 1 t o I T I, t h e s i z e o f t re e T .

S i n c e w e a r e c o n c e r n e d w i t h u n o r d e r e d t re e s , w e c a n f i x a n a r b i tr a r y o r d e r f o r e a c h o f th e

n o d e i n th e t r e e a n d t h e n u s e l e f t - t o - ri g h t p o s t o r d e r n u m b e r i n g o r le f t - t o - r i g h t p re o i~ d er

n u m b e r i n g . W e r e f e r t o t h e a b o v e a s a n u m b e r i n g o f t h e t r e e .

S u p p o s e t h a t w e h a v e a n u m b e r i n g f o r e a c h t re e . L e t t[i] b e t h e i t h n o d e o f tr e e T

i n t h e g i v e n n u m b e r i n g . F o r m a l l y w e d e f i n e a t r i p l e (Me, T~, T 2 ) t o b e a n edi t d i s tance

m a p p i n g f r o m T 1 t o T2, w h e r e M e i s a n y s e t o f o r d e r e d p a i r s o f i n t e g e r s ( i, j ) s a t i s fy i n g :

(0 ) 1 < i < I / ' 1 1 , 1 < j < I T 2I ; w h e r e I T kl i s t h e s i z e o f t r e e T k , k = 1 , 2 .

( 1 ) F o r a n y p a i r o f ( il , j l ) a n d ( i2 , j 2 ) i n M e:

( a ) i l = i2 i f f j l = j 2 ( o n e - t o - o n e ) .

(b ) q [ i l ] i s a n a n c e s t o r o f t1 [i 2] i f f t 2 [ j l ] i s a n a n c e s t o r o f t 2 [j 2 ] ( a n c e s t o r o r d e r

p r e s e r v e d ) .

W e u se M e i n s te a d o f (Me, T1, T 2) i f t h e r e i s n o c o n f u s i o n . L e t M e b e a n e d i t d i s t a n c e

m a p p i n g f r o m T 1 t o T 2. T h e n w e c a n d e f i n e t h e c o s t o f M e :

) / ( M e ) = ~ y ( t l [ i ] ~ t 2 [ j ] ) + y ( t l [ i ] ..-.-> )~)

( i , j ) c M ~ { i l ~ j s. t . ( i , j )E M e }

V()~ ~ t2[j]) .

{j[~[ i s.t . ( i , j ) c M e }

T h e r e l a t i o n b e t w e e n a n e d i t d is t a n c e m a p p i n g a n d a s e q u e n c e o f e d i t o p e r a t i o n s i s

a s f o l l o w s :

L E M M A 1 . Given S , a s equence s 1 . . . . s k o f edi t op erat ions f rom T1 to T2, an ed i tdi s tance ma pp ing Me f ro m Ta to T2 ex i s t s such that V (Me) < y ( S) . Conversely , fo r a ny

ed i t d i s t ance ma pping Me , a s equence o f ed i t opera t i ons ex i st s such t ha t • ( S ) = y (M e) .

B a s e d o n t h e l e m m a , t h e f o l l o w i n g t h e o r e m s t a t e s t h e r e l a t i o n b e t w e e n t h e e d i t d i s :

t a n c e a n d t h e e d i t d is t a n c e m a p p i n g s . T h i s i s w h e y w e c a l l t hi s k in d o f m a p p i n g e d i t

d i s t a n c e m a p p i n g .

T H E O R E M 1 .

De(T1 , T 2) = m i n { v ( M e ) ] Me i s an ed i t d is t ance m apping from T1 to T2}.Me

O n e o f t h e r e s u l t s i n [ 1 9 ] i s t h a t f i n d i n g De(T1,T2) i s N P - c o m p l e t e e v e n i f t h e t re e s

a r e b i n a r y t r e e s w i t h a la b e l a l p h a b e t o f s iz e tw o . K i l p e l a in e n a n d M a n n i l a [3 ] s h o w e d

8/8/2019 Unordered Trees

http://slidepdf.com/reader/full/unordered-trees 4/18

208 Kaizhong Zhang

t h a t e v e n t h e i n c lu s i o n p r o b l e m f o r u n o r d e r e d t re e s i n N P - c o m p l e t e . I n fa c t w e r e c e n t l y

p r o v e d a s t ro n g e r r e s u l t [ 1 6] : th a t th e p r o b l e m o f f in d i n g t h e m i n i m u m c o s t e d it d i s ta n c e

m a p p i n g ( e d it d i st a n c e ) b e t w e e n t w o u n o r d e r e d t r ee s a n d t h e p r o b l e m o f fi n d in g t h e

l a rg e s t c o m m o n s u b tr e e o f t w o u n o r d e r e d t re e s ar e b o t h M A X S N P - h a rd . T h i s m e a n s

t h at t he r e is n o p o l y n o m i a l - t i m e a p p r o x i m a t i o n s c h e m e ( P T A S ) f o r th e s e p r o b l e m s u n le s s

P = N P [ 1 ] . S i n c e u n o r d e r e d t r e e s a r e im p o r t a n t in m a n y a p p l i c a t i o n s , i t i s d e s i r e d t o

f i n d d i s t a n c e s t h a t c a n b e e f f i c i e n t l y c o m p u t e d .

3 . A C o n s t r a i n e d E d i t D i s t a n c e M e t r i c B e t w e e n U n o r d e r e d T r e es . O u r n e w d is -

t a n c e m e t r i c is b a s e d o n a c o n s t r a i n t o f t h e m a p p i n g s a l l o w e d b e t w e e n t w o t re e s . T h e

i n t u it i v e i d e a i s t h a t t w o s e p a r a t e s u b t r e e s o f 7"1 s h o u l d b e m a p p e d t o t w o s e p a r a t e s u b -

t r e e s o f T 2. T h i s i d e a w a s p r o p o s e d b y T a n a k a a n d T a n a k a [ 1 1 ] in th e i r d e f in i t i o n f o r

a s tr u c tu r e p r e s e r v i n g m a p p i n g b e t w e e n t w o o r d e r e d l a b e l e d t re e s a l t h o u g h t h e d ef i ni -t i o n in [ 1 1 ] d o e s n o t c a p t u r e t h e i d e a p r e c i s e l y . T a n a k a a n d T a n a k a [ 1 1 ] a ls o s h o w e d

t h a t in s o m e a p p l i c a t i o n s ( e .g . , c l a s s i f i c a t i o n t re e c o m p a r i s o n ) t h is k i n d o f m a p p i n g i s

m o r e m e a n i n g f u l t h a n e d it d is t a n c e m a p p i n g . I n a s e p a r a t e re s u l t [1 5 ], w e d e v e l o p e d a n

a l g o r i t h m f o r th e c o n s t r a in e d e d i t d i s t a n c e b e t w e e n o r d e r e d l a b e l e d t re e s w i t h t i m e c o m -

p l e x i t y O([TI[[T2[). T h i s p a p e r e x t e n d s t h e d e f in i ti o n f r o m o r d e r e d t r e e s t o u n o r d e r e d

t r e e s .

3.1. Constrained Edi t Distance Mappings. S u p p o s e a g a i n t h a t w e h a v e a n u m b e r i n g

f o r e a c h t r e e . L e t t[i] b e t h e i t h n o d e o f t re e T i n th e g i v e n n u m b e r i n g . L e t T[i] b e t h e

s u b t r e e r o o t e d a t t [ i] a n d l e t F [ i ] b e t h e u n o r d e r e d f o r e s t o b t a i n e d b y d e l e t i n g t [ i] f r o m

T[i].

F o r m a l l y w e d e f i n e a t r ip l e ( M , / ' 1 , T 2) t o b e a constrained ed it distance map pin g

f r o m T 1 t o T 2 , w h e r e M i s a n y s e t o f o r d e r e d p a i r s o f i n t e g e r s ( i, j ) s a t i s f y i n g :

( 1 ) M i s a n e d i t d i s t a n c e m a p p i n g .

( 2 ) F o r a n y t r i p l e ( i l , j l ) , ( i2 , j 2 ) , a n d ( i 3 , j 3 ) in M l e t q[1] b e th e l c a ( q [ i l ] , q [ i 2 ]) a n d

l e t t 2 [ J ] b e t h e l c a ( t 2 [ j l ] , t 2 [ j 2 ]) , w h e r e l c a r e p r e s e n t s l e a s t c o m m o n a n c e s t o r , t l [ I ]

i s a p r o p e r a n c e s t o r o f tl [ i3 ] i f f t2[J] i s a p r o p e r a n c e s t o r o f t z [ j3 ] .

T h i s d e f i n i t io n a d d s c o n d i t i o n ( 2 ) to t h e d e f i n i ti o n o f t h e e d i t d i s t a n c e m a p p i n g .W e e x a m i n e t h e m e a n i n g o f t h e c o n s tr a i n e d e d i t d i s t a n c e m a p p i n g s . S u p p o s e t l [ I ] i s a

d e s c e n d a n t o f t l [ i3 ] , t h e n t l [ i l ] a n d t l [i2 ] m u s t b e d e s c e n d a n t s o f t l [ i3 ] . B y c o n d i t i o n

( 1 ) t 2 [ j l ] a n d t z [ j2 ] m u s t b e d e s c e n d a n t s o f t2 [ j3 ] . T h i s i n t u r n m e a n s t h a t t 2 [ J ] i s a

d e s c e n d a n t o f T 2 [j 3 ]. T h e r e f o r e c o n d i t i o n ( 1 ) i m p l i e s t h a t t l [ I ] i s a d e s c e n d a n t o f t l [i3 ]

i f a n d o n l y i f t z [ J ] i s a d e s c e n d a n t o f tz [ j3 ] . C o n d i t i o n ( 2 ) a d d s t h e c o n s t r a i n t t h a t tl [ I ]

i s a p r o p e r a n c e s t o r o f t1 [i 3] i f a n d o n l y i f t z [ J ] i s a p r o p e r a n c e s t o r o f t z [j 3 ]. S u p p o s e

n o w t h a t n e i t h e r q [ I ] n o r q [ i 3 ] i s a n a n c e s t o r o f t h e o th e r , t h e n b y ( 1 ) a n d ( 2 ) n e i t h e r

t z [ J ] n o r t 2 [ j3 ] i s a n a n c e s t o r o f t h e o th e r .

C o n s i d e r t w o s e p a r a t e s u b t r e e s T 1 [ i l ] a n d T 1 [ i2 ] s u c h t h a t n e i t h e r t l [ i l ] n o r t l [i2 ] i s

a n a n c e s t o r o f t h e o th e r. G i v e n a m a p p i n g M , l e t

M k = { n I ( m , n ) c M a n d t l [ m ] i s a n o d e o f T1 [ik ]}

w h e r e k = 1 , 2 . L e t t z [ j k ] = l e a ( M k ) f o r k = 1 , 2 . T z [ j k ] c a n b e c o n s i d e r e d a s t h e

8/8/2019 Unordered Trees

http://slidepdf.com/reader/full/unordered-trees 5/18

A Constrained Edi t Dis tance Between Unordered Labeled Trees 209

i m a g e o f T l [i k ] u n d e r m a p p i n g M . W i t h o u r d e f in i ti o n o f t h e c o n s t ra i n e d e d i t d i s ta n c e

m a p p i n g , n e i t h e r t 2 [ j l ] n o r t2[j2] i s an ances to r o f t he o the r . Th i s co inc ide s w i th t he

i n tu i ti v e i d e a t h a t s e p a r a te s u b t r e e s o f T l s h o u l d b e m a p p e d t o s e p a r a t e s u b t re e s o f / ' 2 .

W e u s e M i n s te a d o f ( M , T l , T 2 ) i f t h e r e is n o c o n f u si o n . L e t M b e a c o n s t r a in e d e d i td i s t ance m ap p in g f r om T1 to T2 ; t he cos t o f M, F ( M ) , i s w e l l de f ined s i nce M i s a ls o an

e d i t d is t a n c e m a p p i n g .

C o n s t r a i n e d e d i t d is t a n c e m a p p i n g s c a n b e c o m p o s e d . L e t M l b e a c o n s tr a i n e d e d i t

d i s ta n c e m a p p i n g f r o m T1 a n d T 2, a n d l e t M 2 b e a c o n s t r a in e d e d i t d is t a n c e m a p p i n g

f ro m T2 to T3 . D ef ine

M1 o 342 = {(i, j ) I 3k s . t . ( i , k) c M1 and (k , j ) c M e } .

L E M M A 2 .

(1) M1 o M2 is a con straine d edit dis tance m app ing betw een 7"1 an d T3.

( 2 ) ) / ( M 1 o M 2 ) < y ( M 1 ) + y ( M 2 ) .

PRO OF. ( 1 ) f o l l ow s f r om the de f in i ti on o f cons t r a ine d ed i t d i s t ance m app ings . We che ck

cond i t i on ( 2 ) on ly . Le t ( i l , j l ) , ( i 2 , j 2 ) , and ( i3 , J3) be in M1 o ME. By the def in i t ion of

M1 o M2, there a re k l , k2 , and k3 such tha t ( i l , k l ) , ( i2 , k2) , and ( i3 , k3) a re in M1 and

( k l , j l ) , ( k 2 , j 2 ) , and (k3 , j3 ) a re in M2. Le t I be the lca( i l , i2 ) , l e t K be the lca(k l , k2) ,

and l e t J b e t he l c a ( j l , j 2 ) . By t he de f in i ti on o f M1 and M2 , I i s a p r op e r ances to r o f i3

i f f K i s a p r o pe r ance s to r o f k3. M or eov e r , K i s a p r o pe r ance s to r o f k3 i f f J i s a p r ope r

ances to r o f j 3 . The r e f o r e I i s a p r o pe r an ces to r o f i3 if f J i s a p r o pe r ance s to r o f J 3.

( 2 ) Le t MI be t he cons t r a ined ed i t d i s t ance mapp ing f r om T1 to T2 , l e t / I / / 2 be t he

cons t r a ined ed i t d i s t ance mapp ing f r om T2 to T3 , and l e t M1 o M2 be t he compos ed

cons t r a ined ed i t d i s tance m app ing f r o m T1 to T3. Th r ee gene r a l s i t ua ti ons occu r : ( i , j )

M1 oM 2, i ~ M1 , o r j ( M2 . I n each ca s e th i s co r r e s p ond s t o an ed i t ope r a t i on F ( x --+ y )

w h e r e x and y m ay be n odes o r ma y be )~. I n a l l s uch ca s e s , t he t r iang l e i nequa l i t y on

the d is tan ce m et r ic y ensu res tha t F (x ~ y) < F (x - -~ z ) + F (z ~ y) . [ ]

3.2. A Cons t r a ined Ed i t D is tance Be tween Trees . W e c a n n o w d e f in e a d i s s i m i l a ri t y

m eas u r e be tw een T1 and T2 a s :

D ( T 1 , T2) = m in{ g ( M ) [ M i s a cons t r a ine d ed i t d i s t ance ma pp ing f r om T1 to T2}.M

I n f ac t th i s d i s s imi l a r i t y mea s u r e i s a d i s t ance me t r i c .

T H E O R E M 2 .

(1 ) D ( T , T ) = O .

(2 ) D ( T 1 , T2) = D(T2, T~) .

(3 ) D(T~, T3) < D(T~, T2) + D(T2, T3).

PRO OF. ( 1 ) and ( 2 ) f o l l ow d i r ec t l y f r om the de f in i t i on o f t he cons t r a ined ed i t d i s t ance

m a p p i n g . F o r ( 3 ) , c o n s i d e r t h e m i n i m u m c o s t c o n s t r a i n e d e d i t d i s t a n c e m a p p i n g s M 1

8/8/2019 Unordered Trees

http://slidepdf.com/reader/full/unordered-trees 6/18

2 1 0 K a iz h o n g Z h a n g

b e t w e e n T1 a n d T 2 a n d M 2 b e t w e e n T 2 a n d / ' 3 . I t i s e a s y t o s e e t h e f o ll o w i n g :

D(T1, T 3) _< y ( M 1 o M 2 ) 5 y ( M 1 ) + z ( M 2 ) = D(T~, T2) + D(T2, T3) . [ ]

W e c a ll t h i s d i s t a n c e m e t r i c t h e c o n s t r a i n e d e d i t d i s t a n c e . T h e r e l a ti o n b e t w e e n t h e

c o n s t r a i n e d e d i t d i s t a n c e m e t r i c D a n d th e e d i t d i s t a n c e m e t r i c D e i s De(T1 , T2) _<

D(T1, T 2 ). T h e r e a s o n i s t h a t a n y c o n s t r a in e d e d i t d i s t a n c e m a p p i n g i s a l w a y s a n e d i t

d i s t a n c e m a p p i n g .

4 , Proper t ie s o f the Co ns tra ined Ed i t D is tance . I n th i s s e c t i o n w e p r e s e n t s e v e r a l

l e m m a s w h i c h a r e t h e b a s i s fo r th e a l g o r i t h m i n t h e n e x t s e ct io n . W e u s e 0 t o r e p r e s e n t

t h e e m p t y t r ee .

L E M M A 3. Let h[ i l ] , t l [i 2 ] . . . . . ta[ini] be the children o f t l [ i ] and let t 2 [ j l ] , t 2 [ j 2 ] ,

. . . . tz[jnj] be the children of tz[j]. Then

D(O , O) =

D (FI[i] , O) =

D(O, F 2 [ j ] ) =

,

ni

Z D(T l[ ik] , 0) ,k = l

nj

Z D ( O , T 2 [ j k ] ) ,k = l

D(TI[ i ] , O )= D(F I[ i ] , O) + y ( t l [ i ] - + ~ . ) ,

D(O, T 2 [ j ] ) : - D ( O , F 2 [ j ] ) + y () ~ ~ t 2 [ j ] ) .

P R O O F . T r i v i a . [ ]

L E M M A 4 . Letq[ ia] , t 1 [i 2] . . . . . t l[ i, i] beth ec hi ld ren ofq [i] and let t2[j l ], t 2 [j 2 ] . . . . .

t2[Jnj] be the children o f t2[j]. Then

D(TI[ i ] , T 2 [ j ] ) = m i n

D(O, T 2 [ j ] ) + m i n {D (TI [i], T2[jt]) - D(O , T2[jt])},l<t<nj

D (TI[ i], O) + m i n {D(Tl[is], T 2 [ j ] ) - D(TI[is], 0 ) } ,l<s<n i

D(FI[ i ] , F 2 [ j ] ) + ~ ) ( t l [i ] ~ t 2 [ j ] ) .

P R O O F. C o n s i d e r th e m i n i m u m c o s t c o n s t ra i n e d e d it d i s ta n c e m a p p i n g M b e t w e e n

T I [ i ] a n d T 2 [ j ]. T h e r e a r e f o u r c a s e s :

( 1 ) i e M and j qg M .

( 2 ) i C M a n d j e M .

( 3 ) i e M a n d j 6 M .

( 4 ) i C M and j C M .

Case 1 . L e t ( i, t ) i n M . S i n c e j r M , t m u s t b e a n o d e in F 2 [ j ] . L e t t2[jt] b e t h e

c h i l d o f t 2 [ j ] o n t h e p a t h f r o m t z [ t] t o t2 [ j ] . T h u s D(TI[i] , T 2 [ j ] ) = D(T1 [ i ] , T2 [ j t ] ) - 4 -

8/8/2019 Unordered Trees

http://slidepdf.com/reader/full/unordered-trees 7/18

A Constrained Edit Distance Betwee n Unordered Labeled Trees 2 11

D ( O , T 2 [ j l ] ) + - . . + D ( O , T2 [ j t -1 ] ) + D( O , T2 [ j t + I ] ) + . . . + D( O , T2 [j nj ]) + y ( L , t 2 [ j ] ) .

S i n c e D ( O , T 2 [ j ] ) ----- y ( / k , t 2 [ j ] ) + ~ : - - 1 D( O, Tz [ j k ] ) , w e c a n r e w r i t e t h e r i g h t - h a n d

s i d e o f th e f o r m u l a a s D ( O , T 2 [ j ] ) + D (T I[ i ] , T2[ jt ]) - D (O, T2[ j t ] ). S i n c e t h e ra n g e o f

k i s f r o m 1 t o n j , w e t a k e t h e m i n i m u m o f th e s e c o r r e s p o n d i n g c o s ts .

C a s e 2 . S i m i l a r t o c a s e 1 .

C a s e 3 . S i n c e i ~ M a n d j c M , b y t h e c o n d i t i o n o f c o n s t r a i n e d e d i t d i s t a n c e m a p p i n g ,

( i, j ) m u s t b e in M . S i n c e M - { (i , j ) } i s a c o n s t r a i n e d e d i t d i s t a n c e m a p p i n g b e t w e e n

F t [ i ] a n d F2 [ j ] , a n d f o r a n y c o n s tr a i n e d e d i t d i s ta n c e m a p p i n g M ' b e t w e e n F 1 [ i] a n d

F z [ j ] , ( i, j ) tO M ' i s a c o n s t r a i n e d e d i t d i s ta n c e m a p p i n g b e t w e e n T I [ i] a n d T2[ j ] , w e

k n o w D ( T I [ i ] , Tg_[j]) = D ( F I [ i ] , F 2 [ j ] ) + y ( q [ i ] , tv_[j]).

C a s e 4 . S i m i l a r t o C a s e 3. T h e f o r m u l a w o u l d b e D ( T I [ i ] , T z [ j ] ) = D ( F 1 [ i ] , F z [ j ] ) +

y ( t l [ i ] , ~ ) + y 0 ~ , t 2 [ j ] ) . S i n c e ~ ' ( q [ i ] , t 2 [ j ] ) _< ~' ( q [ i ] , ~ .) + y ( ~ ., t2 [ j ] ) , w e d o n o t h a v et o i n c l u d e t h is c a s e i n o u r f in a l f o r m u l a . [ ]

B e f o r e w e p r o c e e d t o t h e n e x t l e m m a w e n e e d t h e f o l l o w i n g d e f in i ti o n.

A r e st ri ct e d m a p p i n g R M ( i , j ) b e t w e e n F I [ i] a n d F 2 [ j ] i s d e f i n e d a s f o l l o w s :

( 1 ) R M ( i , j ) i s a c o n s t r a i n e d e d i t d i s t a n c e m a p p i n g b e t w e e n F 1 [ i] a n d F 2 [ j ] .

( 2 ) I f ( l , k ) i s i n R M ( i , j ) , q [ l ] i s i n Tl[is], a n d t 2 [ k ] i s i n T2[jt], t h e n , f o r a n y ( l l, k l)

i n R M ( i , j ) , t l [ l l ] i s i n T~[is] i f a n d o n l y i f t 2 [ k l] i s in T2[jt].

S i n c e a r e s t ri c t e d m a p p i n g i s a c o n s t r a i n e d e d i t d is t a n c e m a p p i n g , t h e c o s t o f ar e s tr i c te d m a p p i n g i s w e l l d e f in e d . T h e m e a n i n g o f th e r e s tr i ct i o n i s t h a t n o d e s f r o m a

s u b t r e e T 1 [i~ ] c a n o n l y m a p t o n o d e s o f o n e s u b t r e e T2[jt] a n d v i c e v e r s a . I t t u r n s o u t t h a t,

a l t h o u g h t h e r e a r e e x c e p t i o n s , t h i s is t h e g e n e r a l c a s e w h e n w e c o n s i d e r t h e c o n s t r a i n e d

e d i t d i s t a n c e m a p p i n g b e t w e e n / 7 1 [i ] a n d F 2 [ j ] .

L E M M A 5 . L e t q [ i l ] , t t[ i 2 ] . . . . . q [ in , ] b e t h e c h i l d r e n o f q [ i ] a n d l e t t 2 [ j l ] , t 2 [ j 2 ] ,

. . . . t 2[ jn j] b e t h e c h i l d r e n o f t 2 [ j] . Th e n

D ( F I [ i ] , F 2 [ j ] ) = m i n

D ( O , F z [ j ] ) + l m in n {D ( Fl [ i ] , F 2 [ j t] ) - D( O , Fz [ j t ] ) } ,

D ( F I [ i ] , O ) + m i n {D( F~[ i s ] , 0 ) } ,l<s<ni F 2 [ j ] ) - D ( V l [ i s ] ,

m i n RM(i ,j) y ( R M ( i , j ) ) .

P R O O F . W e c o n s i d e r t h e m i n i m u m - c o s t c o n s t r a in e d e d i t d i s t a n c e m a p p i n g M b e t w e e n

Ft[ i ] a n d F 2 [ j ] . T h e r e a r e f o u r c as e s .

C a s e 1 . T h e r e i s a 1 < t < n j s u c h t h a t i f ( k , l ) ~ M , t h e n t 2 [l ] i s a n o d e i n s u b t r e e

T2[jt]; a n d t h e r e a r e ( k t , l l ) a n d ( k 2, 12 ) i n M s u c h t h a t q [ k t ] i s a n o d e i n T l [ i s l ] a n d

q [ k 2 ] i s a n o d e i n T t [i s 2] , w h e r e 1 < Sl ~ s2 < ni. N o t e t h a t i n t h i s c a s e j t c a n n o t b e

i n M . T h i s i s s i m i l a r t o c a s e 1 i n L e m m a 4 , a n d h e n c e w e h a v e t h e f o l l o w i n g f o r m u l a :

D ( F t [ i ] , F 2 [ j ] ) = D ( O , F 2 [ j ] ) + minl<_t<_nj{D(Fl[ i], F2 [j t] ) - D (O , F2 [j t] )}.

8/8/2019 Unordered Trees

http://slidepdf.com/reader/full/unordered-trees 8/18

212 KmzhongZhang

C a s e 2 . T he re i s a 1 < s < n i s u c h t h a t i f ( k , l ) ~ M, t h e n q [ k ] i s a n o d e i n s u b t r e e

7"1 [ is ] ; an d th ere are (k l , l l ) an d (k2, l z ) in M suc h tha t t2 [ /1 ] is a nod e in T2[j t , ] an d t2[12]

i s a n o d e i n T 2 [ j t : ] , w h e r e 1 < t l ~ t 2 < n j . T h i s i s s i m i l a r t o c a s e 1 .

C a s e 3 . T h e r e a r e s a n d t s u c h t h a t i f ( k , l ) 6 M , t h e n t l [ k ] i s n o d e i n Tl [ i s ] a n d t 2 [l ]i s n o d e i n Tz[ i t ] . I n th i s c a s e M i s a r e s t r ic t e d m a p p i n g a n d t h e r e f o r e D ( F 1 [ i ] , F 2 [ j ] ) =

m i nR M ( i, j ) y ( R M ( i , j ) ) .

C a s e 4 . T he re a re (k l , l l ) , (k2, 12 ) , (x l , Y l ) , (x2, Y2) in M suc h tha t q [ k l ] i s a n o d e i n

T1 [ is , ] , t l [k2] is a n o de in T1 [ is~] , t2 [y l] is a no d e in T2[it, ] , a n d t 2 [ y 2 ] is a n o d e i n T2[it2],

w h e r e 1 < s l ~ s2 <_ ni an d 1 < t l ~ t2 < n j . W e s h o w t h a t i n t h is c a s e t h e c o n s t r a i n e d

e d i t d is t a n c e m a p p i n g M i s a r e s t r ic t e d m a p p i n g b e t w e e n F 1 [ i ] a n d F 2 [ j ] .

S u p p o s e t h is i s n o t t ru e . T h e n , w i t h o u t l o s s o f g e n e r a l i ty , w e a s s u m e t h a t th e r e a r e

( a l , b l ) a n d ( a 2, b 2 ) i n M s u c h t h a t q [ a l ] a n d t l [a 2 ] b e l o n g t o t h e s a m e s u b t r e e a n d

t 2 [ b l ] a n d t 2 [b z ] b e l o n g t o d i f f e r e n t su b t r e e s . L e t ( a 3, b 3 ) e M s u c h t h a t q [ a 3 ] a n dt l [ a l ] b e l o n g t o d i f f e r e n t s u b t r e e s o f F 1 [ i ] . N o w c o n s i d e r l c a ( q [ a 1 ], f i [ a 2 ]) a n d t l [ a 3 ] ,

S i n c e q [ a l ] a n d t l [ a2 ] b e l o n g t o t h e s a m e s u b t r e e w h i c h i s d i ff e r e n t f r o m t h e s u b t r e e

t l [ a 3 ] b e l o n g s t o , w e k n o w t h a t l c a ( t l [ a l ] , t l [ a 2 ]) a n d t l [ a 3 ] d o n o t h a v e t h e a n c e s t o r o r

d e s c e n d a n t r e l a t i o n s h i p . H o w e v e r , i f w e c o n s i d e r l c a ( t2 [ b l ] , t 2 [ b 2 ] ) a n d t2 [ b3 ] , i t is e a s y

t o s e e t h a t l c a ( t 2 [ b l ], t 2 [b 2 ]) = t 2 [ j ] , w h i c h i s a n a n c e s t o r o f t 2[ b3 ]. T h i s m e a n s t h a t M

i s n o t a v a l i d c o n s t r a i n e d e d i t d i s t a n c e m a p p i n g . C o n t r a d i c t i o n . [ ]

F r o m L e m m a 5 , i n o r d e r t o c o m p u t e D ( F I [ i ] , F 2 [ j ] ) w e h a v e t o c o m p u t e

minRM(i,j) y ( R M ( i , j ) ) . F r o m t h e d e fi n it i on o f th e r e s tr i ct e d m a p p i n g , w e k n o w t h a t

n o d e s o f o n e s u b t re e o f F 1 [ i ] c a n o n l y m a p t o n o d e s o f o n e s u b t re e o f F 2 [ j ] . T h i s m e a n s

t h a t i f Tl [ i s ] i s i n v o l v e d i n t h e m a p p i n g , t h e n i t w i l l m a p t o T 2 [ j t] f o r s o m e 1 < t < n j ;

i f i t i s n o t i n v o l v e d i n t h e m a p p i n g , t h e n i t w i l l m a p t o a n e m p t y t r e e. T h i s g i v e s a

k i n d o f m a t c h i n g b e t w e e n s u b t re e s o f F I [ i ] a n d s u b t re e s o f F 2 [ j ] . T h e n e x t t w o l em -

m a s e s t a b l i s h t h e r e l a t i o n s h i p b e t w e e n m i n R M ( i, j) y ( R M ( i , j ) ) a n d D ( T 1 [is ], T 2[ j t] ) ,

w h e r e 1 < s < n i a n d 1 < t < r / j . W e n e e d t h e f o l l o w i n g d e f i n i t i o n s i n t h e n e x t t w o

l e m m a s .

L e t I = { i l , i2 . . . . . in~}, J = { j l , j2 . . . . . j n j } , A = { a l , a 2 . . . . . a n j }, a n d B =

{ b l , b z . . . . . b n, }. W e u s e A a n d B t o r e p r e s e n t e m p t y t r e e s . L e t L = I U A a n d R = J t3 B ,

w e d e f i n e a w e i g h t e d b i p a r t i t e g r a p h G ( i , j ) = ( V , E ) a s f o l lo w s :

(1 ) V = L U R .

( 2 ) E = L • R , w h e r e • s t a n d s f o r t h e C a r t e s i a n p r o d u c t .

(3 ) w ( u , v ) = D ( T I [ u ] , T 2 [ v ]) i f u 6 I a n d v 6 J ,

w ( u , v ) =- D ( T l [ u ] , O ) i f u c I and v c B ,

w ( u , v) = D ( O , T 2 [ v ] ) i f u 6 A a n d v 6 J ,

w ( u , v ) = O i f u E A a n d v 6 B .

A m a t c h i n g o n G ( i , j ) i s a s u b s e t o f e d g e s M C E s u c h t h a t, f o r a l l v e r t i c e s v 6 V ,

a t m o s t o n e e d g e o f M is i n ci d e n t o n v . T h e s i z e o f M , I M h i s th e n u m b e r o f e d g e s i n

M . A m a x i m u m m a t c h i n g i s a m a t c h i n g w i t h m a x i m u m c a r di n a li ty , th a t is , a m a t c h i n g

M s u c h t ha t, f o r a n y m a t c h i n g M ' , w e h a v e I M I > I M ' I . L e t M M ( i , j ) b e a m a x i m u m

m a t c h i n g o n G ( i , j ) , t h e n I M M ( i , J ) l = n i + n j .

8/8/2019 Unordered Trees

http://slidepdf.com/reader/full/unordered-trees 9/18

A C o n s t r a i n e d E d i t D i s t a n c e B e t w e e n U n o r d e r e d L a b e l e d T r e e s 2 1 3

L e t M M ( i, j ) b e a m a x i m u m m a t c h in g o n G ( i , j ) . T h e c o s t o f M M ( i , j ) , t h e s u m -

m a t i o n o f t h e w e i g h t s o f th e e d g e s i n M M ( i, j ) , i s d e f in e d a s f o l l o w s :

y ( M M ( i , j ) ) = y~ w(u, v)(u, v) ~M M(i, j )

n~d~ D(T1 [ u l , T 2 [ v ] )(u,v)EM M a E1 a n d v E J

+ Z D(TI[u] , O)

(u,v)6MM a n d uel and oeB

Z D(O, T 2 [ v ] ) .

(u.v)EMM and ueA an d vcJ

L E M M A 6 . minRM( i, j) y (R M ( i , j ) ) = minMM(i,j) y (M M ( i , j ) ) .

P R O O F . G i v e n a r e s t r i c t e d m a p p i n g R M ( i , j ) , w e d e f i n e a m a x i m u m m a t c h i n g M M ( i , j )

o n G ( i , j ) a s f o l lo w s :

M M ( i, j ) = H M M U IM M U J M M U K M M ,

w h e r e

HM M : {(is, jr) [ t h e r e i s (k, l) 9 R M ( i , j )

s . t . q[k] ( t 2 [ / ] ) i s a n o d e i n Tl[is] (T2[jt])},

IMM = { ( i s , b s ) I t h e r e i s n o j t s . t . (is, jr) 9 HMM},

JMM : { ( a t , j r ) [ t h e r e i s n o i s s .t . ( i s , jr ) E HMM},

KMM = { ( a s , bt) I (it, js) E HMM }.

B y t h e d ef i n it io n o f a r e s tr i c te d m a p p i n g t h is i s i n d e e d a m a x i m u m m a t c h i n g . F u r t h e r-

m o r e , i t is e a s y t o s e e t h a t y ( M M ( i , j ) ) < y ( R M ( i , j ) ) .O n t h e o t h er h an d , g i v en a m a x i m u m m a t c h i n g M M (i, j ) , w e c a n c o n s t r u c t a r e s t r i c t e d

m a p p i n g R M ( i , j ) a s f o l l o w s :

R M ( i , j ) = ( k , I )

t h e r e e x i s t s ( i s , Jr) 9 M M ( i , j ) s . t . ( k , l ) i s i n t h e

m i n i m u m c o s t c o n s t r a in e d e d i t d is t an c e m a p p i n g

b e t w e e n Tl[is] a n d T 2 [ j t ]

I t i s c l e a r t h a t t h i s i s i n d e e d a r e s t r i c t e d m a p p i n g . T h e r e f o r e ,

y ( m m ( i , j ) ) .H e n c e m i n R M ( i , j ) v ( R M ( i , j ) ) = m i n M M ( i , j ) y ( M M ( i , j) ) .

y ( R M ( i , j )) <

[ ]

C o m b i n i n g L e m m a s 5 a n d 6 , w e h a v e p r o v e d t he f o l lo w i n g l e m m a .

8/8/2019 Unordered Trees

http://slidepdf.com/reader/full/unordered-trees 10/18

214 Kaizhong Zhang

L E M M A 7 . L e t q [ i l ] , q [ i2 ] . . . . . t l[ im ] b e t h e c h il d re n o f t l[ i ] a n d l e t t 2 [ j l ] , t 2 [ J 2 ] ,

. . . . t 2[ jn i] be the ch i ldren o f t 2 [ j ]. T hen

D ( O , F 2 [ j ] ) + m i n D ( F l [ i ] , F e [ j t ] ) - D ( O , F z [ j t ] ) ,1<t <nj

D ( F l [ i ] , F 2 [ j ] ) = m i n D ( F l [ i ] , O ) + m i n D ( F l [ i s ] , F 2 [ j ] ) - D(F 1[ i s ] , 0 ) ,1<s <ni

m i n y ( M M ( i , j ) ) .MM(i,j)

5 . A l g o r i t h m a n d C o m p l e x i t y . W e f ir st c o n s i de r h o w t o c o m p u t e

m i n y ( M M ( i , j ) )MM(i,j)

a n d t h e n p r e s e n t o u r s i m p l e a l g o r i th m .

5 . 1 . A l g o r i t h m . F r o m t h e d e f in i t i o n o f M M ( i, j ) a n d y ( M M ( i , j ) ) , i t i s c l e a r t h a t t h i s

p r o b l e m i s t he m i n i m u m c o s t m a x i m u m b i pa r ti te m a t c h i n g p ro b l e m . T h e r e f o r e w e c a n

u s e th e w e i g h t e d m a x i m u m m a t c h i n g a l g o r i th m t o so l v e t hi s p ro b l e m . H o w e v e r , s i n c e

w e h a v e m a n y e m p t y t re e s in g r a p h G ( i , j ) , t h i s w i l l r e s u l t i n r e d u n d a n t c o m p u t a t i o n .

I n st e ad , w e r e d u c e t h is p r o b l e m d i re c t l y to t he m i n i m u m c o s t m a x i m u m f lo w p r o b l e m

b y a d d i n g o n l y t w o e m p t y t r e e s , o n e t o F 1 [ i] a n d t h e o t h e r t o F e [ j ] .G i v e n F l[ i ] a n d F 2 [ j ] , l e t I = { i l , i2 . . . . . . in ,} a n d J = { j l , j2 . . . . . j n j } , w h e r e i k,

1 < k < hi, r e p r e s e n t s t r e e Tl[ik], a n d j l , 1 < l < n j , r e p r e s e n t s t r e e T 2 [ j d .

W e c o n s tr u c t a g r a p h G = ( V , E ) a s f o l lo w s :

v e r t e x s e t: V = {s, t, e l, ej } t.J I U J, w h e r e s i s t h e s o u r c e , t i s t h e s i n k ,

a n d e i a n d e j r e p r e s e n t t w o e m p t y t r e e s ;

e d g e s e t : [ s , i k ] , [ s , eli , [ j l , t] , [ej , t] w i t h c o s t z e r o , [ i k, j l ] w i t h c o s t

D ( T l [ i k ] , T 2 [ j t ] ) , [el, jt] w i t h c o s t D ( O , T 2 [ j l ] ) , [ i k , ej] w i t h c o s t

D(Tl[ ik] , 0) , and [e l , e j ] w i t h c o s t z e r o . A l l th e e d g e s h a v e c a p a c i t y o n e

e x c e p t [ s , el], [ei, ej], a n d [ej, t] w h o s e c a p a c i t i e s a r e n j , m a x { n / , n j } -m i n { n i , n j } , an d h i , r e s p e c t i v e l y .

G i s a n e t w o r k w i t h i n te g e r ca p a c i t ie s , n o n n e g a t i v e c o s t s, a n d th e m a x i m u m f l ow f * =

n i -1- n j ( s e e F i g u r e 1 ) . L e t c o s t ( G ) b e t he c o s t o f t h e m i n i m u m c o s t o n m a x i m u m f lo w s

o f G , w h e r e o n l y t h e c o s t s o n p o s i ti v e f lo w s a r e c o n s id e r e d . T h e f o l l o w i n g l e m m a s t a te s

t h e r e l a t i o n b e t w e e n c o s t ( G ) an d min MM( i , j ) y ( M M ( i , j ) ) , w h i c h e n a b l e s u s t o u s e t h e

m i n i m u m c o s t f lo w a l g o r i t h m t o c o m p u t e m in M M (i.j) g ( M M ( i , j ) ) .

L E M M A 8 . c o s t ( G ) = m in M M ( i,j ) y ( M M ( i , j ) ) .

P R O O F . G i v e n a n M M ( i , j ) , y ( M M ( i , j ) ) r e p r e s e n t s t h e c o s t o f t h e f o l l o w i n g m a x i m u m

f l o w O n G : f o r a n y ( is , j r ) c M M ( i , j ) t h e f l o w o n e d g e [ is , Jt] i s o n e ; f o r a n y ( i s , b t ) E

8/8/2019 Unordered Trees

http://slidepdf.com/reader/full/unordered-trees 11/18

A C o n s t r a i n e d E d i t D i s t a n c e B e t w e e n U n o r d e r e d L a b e l e d T r e es

iJ

2 1 5

i k I , D O t [ i t ] , T 2 [j d) j l

1 ,

2,0

F i g . 1 . R e d u c t i o n t o t h e m i n i m u m c o s t f l ow p r o b l e m .

M M ( i , j ) t h e f l o w o n e d g e [ i s , e j] i s o n e ; f o r a n y ( a s , j r ) ~ M M ( i , j ) t h e f l o w o n e d g e

[ei, j t ] i s o n e ; t h e f l o w o n e d g e

[el, e j ] = I { ( u , V ) I (U , V ) E M M ( i , j ) a n d u E A a n d v c B } I ;

t h e f l o w s o n e d g e I s , i k] a n d e d g e [ j l , t] a r e o n e ; t h e f l o w o n e d g e [ s , el] i s n j; t h e f l o wo n e d g e [ej , t] i s n i; a n d a l l t h e f l o w s o n t h e o t h e r e d g e s a r e z e r o .

O n t h e o th e r h a n d , g i v en a m a x i m u m f lo w o n G , c o s t ( G ) r e p r e s e n t s t h e c o s t o f th e

f o ll ow i n g m a x i m u m m a t c h in g o n G ( i , j ) :

M M ( i , j ) = { ( L , j t ) , (a t , b s ) [ t h e f l o w o n e d g e [ i s , j t ] i s o n e }

U { ( is , b s ) [ t h e f l o w o n e d g e [is, ej] i s o n e }

U { ( a t , j r ) [ t h e f lo w o n e d g e [ei, jr] i s o n e } .

T h e r e f o r e t h e c o s t o f t h e m i n i m u m c o s t m a x i m u m f lo w is e x a ct lym i n M M ( i , j ) y ( M M ( i , j ) ) . [ ]

W e a r e n o w r e a d y t o g i v e o u r a lg o r i t h m .

8/8/2019 Unordered Trees

http://slidepdf.com/reader/full/unordered-trees 12/18

216 Kaizhong Zhang

Input: T1 and T2.

Output: D(TI[ i] , T2 [ j ] ) , wh ere 1 < i < iTl l and 1 < j < IT21.

D(O, 0) = 0 ;f o r / = 1 to I T s ]

n l

D (F l[i] , O) -= ~ D (Tl[ ik] , O)k = l

D (TI[i] , O) ---- D (FI [i] , O )+ y( t a [ i ] ~ )0

fo r j = 1 to IT21n j

D(O, F 2 [ j ] ) = Z D(O, T2[jk])k = l

D(O, T2[j]) ----D(O, F2 [ j ] ) + ) /(~ . -+ t2[ j ] )

f o r i = 1 t o I T l l

f o r j = 1 t o IT 2 1

D(O, F2 [ j ] ) + r a i n {D (F~ [i l , F2[j t]) - D(O, F2[ j , I )} ,l<t<nj

D ( F I [ i ] , F2 [ j ] ) ---- ra in D (F~ [ i ] , 0) + ra in { D ( F l t i s ] , F z t j ] ) - D ( F l [ i s ] , 0)} ,l<s<n i

ra in y ( M M ( i , j ) ) .M M (i , j )

D ( T l [ i ] , T2 [ j ] ) = ra in

D(O, T2 [ j ] ) + mi n {D(TI[ i ] , T2[ j t ] ) - D(O , Tz[j t])},l < t <n j

D(TI[i] , O) + rain {D(T l [ i s ] , T2[ j ] ) - O(Tl[ i s ] , 0)} ,l<_s<ni

D ( F l [ i ] , F2[ j ] ) + y ( t l [ i ] - -~ t 2 [ j ] ) .

5 .2 . Complexity. T h e c o m p l e x i t y o f c o m p u t i n g D(TI[ i ] , T 2 [ j ] ) i s , b y L e m m a 4 ,

b o u n d e d b y O ( n i q '- n j ) . T h e c o m p l e x i ty o f c o m p u t i n g D(Fa[i], F 2 [ j ] ) i s b o u n d e d

b y O ( h i - } - n j ) p l u s th e c o m p l e x i t y o f t h e m i n i m u m c o s t m a x i m u m f lo w c o m p u t a t io n .

O u r g r aph G i s a g r aph w i t h i n t ege r capac i t ie s , nonnega t i ve edge cos t s, and max i mu mflow f * ~ -- n i "[- n j. T h e c o m p l e x i t y o f f in d i ng m i n i m u m c o s t m a x i m u m f lo w f o r su c h

a g r aph w i t h n ve r t i ce s and m edg es i s O ( m l f * l log(2+mt,) n ) < O ( m L f * l log2 n) [12] .

F o r o u r g r a p h , n = n i + n j + 4 and m = ni x n j + 2n i + 2n j + 3 ; t he r e fo r e t he comp l ex i t y

is b o u n d e d b y O ( n i x n j x ( h i -t- n j ) x log2(ni -4-n j ) ) .

H e n c e f o r a n y p a i r i a n d j , t h e c o m p l e x i t y o f c o m p u t i n g D(TI[ i ] , T2[ j ] ) and

D ( F I [ i ] , F E [ j ] ) i s b o u n d e d b y O ( n i x n j x (n i -1- n j) x log2(n i + n j ) ) . T h e r e f o r e

t h e c o m p l e x i t y o f o u r a l g o r it h m i s

ITd IT2~

E E O(n i x n j • (n i + n j ) • log2(ni + n j ) )i = 1 j = l

I/'11 t/'21

< ~ _ ~ O (ni • n j • (deg(T1) -4-deg(T2)) x log2(deg(T1) -4-deg(T2)))

i = 1 j = l

8/8/2019 Unordered Trees

http://slidepdf.com/reader/full/unordered-trees 13/18

A Constrained Edit Distance Between Unordered Labeled Trees

<_ 0 ( d e g (T 1 ) + d e g ( T 2 )) x l o g 2 ( d e g ( T 1 ) + d e g ( T 2 ) ) x E n i x n ji=1 j=l

_< O (I T ll • IT21 • ( d e g ( T 1 ) + d e g ( T 2 ) ) • l o g 2 ( d e g ( T t ) + d e g ( T 2 ) ) ) .

217

6. Approxim ate Unordered Tree M atching. A p p r o x i m a t e u n o r d e r e d t r ee m a t c h in g

i s a na tu ra l ex t ens ion o f app rox im a te s t ri ng ma tch ing [5 ] , [13 ] , [2 ] and ap pro x im a te

o rde re d t r ee m a tch in g [ 17 ].

T h e a p p r o x i m a t e - s t r i n g - m a t c h i n g p r o b l e m i s a s fo l l o w s . G i v e n t w o s t r in g s T E X T a n d

PAT, the p r o b l e m i s t o c o m p u t e , f o r e a c h i , S D [ i , P A T ] = m i n j { D ( T E X T [ j . . i] , P A T ) },

w he re 1 _< j _< i + 1 an d S D i s the s t r i ng d i s t ance m e t r i c . W i th t he ed i t d i s t ance m e t r i c ,

t h e p r o b l e m i s to c o m p u t e , f o r ea c h i , t h e m i n i m u m n u m b e r o f e d it o p e r a ti o n s b e t w e e n

" pa t t e rn" s t r i ng PAT[ 1 .. I PA TI] a n d t h e " t e x t " s u b s t r in g T E X T [ 1 . . i ] , where any p re f ix

c a n b e r e m o v e d f r o m TEXT[1 . . i ] . In tu i t i ve ly t he p rob lem i s t he f ind the " occu r r enc e" i n

T E X T t h at m o s t c l o s e ly m a t c h e s P A T .

G i v e n p a t t e r n t r e e P a n d d a t a t re e T , i n o r d e r t o a ll o w P t o m a t c h o n l y a p a r t o f T ,

w e m u s t g e n e r a l i z e t h e n o t i o n o f a s u b s t r i n g a n d t h e n o t i o n o f r e m o v i n g a p r e f ix . F o r u s ,

a subs t r i ng mea ns a sub t r ee and a p re f ix m eans a co l l ec t i on o f sub t rees .

A s s u m e a n u m b e r i n g o f tr e e T , w e u s e t[i] t o re p r e s e n t t h e i t h n o d e o f t re e T , T[i] to

r ep rese n t t he sub t r ee roo ted a t t [ i ] , and Tf [ i] t o r ep resen t t he fo re s t ob t a ined by de l e t i ng

t[i] f r o m T[ i ] . Simi l a r ly , f o r pa t t e rn t r ee P w e can de f ine p[ i ] , P [ i ] , a n d P f [ i ] .

W e f i rs t de f ine t he ope ra t ion o f r em ov ing a t a nod e [17 ] .Rem oving a t node t [ i] m e a n s r e m o v i n g t h e s u b t r e e r o o t e d a t t[i]. G i v e n t r e e T , w e

def ine S ( T ) as a s e t o f nodes o f T sa t i s fy ing

t [ i ] , t [ j ] ~ S ( T ) imp l i e s t ha t ne i the r is an ance s to r o f t he o the r .

S ( T ) r e p r e s e n t s a s e t o f n o n o v e f l a p p i n g s u b t re e s o f T .

D e f i n e R ( T , S ( T ) ) t o b e t h e t r e e o b t a i n e d b y removing a t a l l t he node s i n S ( T ) f r o m

t ree T .

N o w w e c a n g i v e t he d e f i n it io n o f a p p r o x i m a t e u n o r d e r e d t r e e m a t c h i n g . G i v e n p a t t e r n

t r ee P and da t a t r ee T , fo r each sub t r ee T[ i ] , w e w a n t t o c o m p u t e

D r ( P , T [ i ] ) = m } n { D ( P , R(T[ i ] , S (T[ i ] ) ) ) } .

T h e m i n i m u m h e r e i s o v e r a ll p o s s i b l e s u b t r e e s e t s S ( T [ i ] ) .

The in tu i t i ve mean ing o f t he above de f in i t i on i s a s fo l l ows . G iven a pa t t e rn t r ee P

a n d a d a t a t re e T , w i t h a p p r o x i m a t e tr e e m a t c h i n g w e w a n t t o fi n d p a r t o f T , s a y T e, t h a t

m o s t c l o s e ly m a t c h e s P . W e a s s u m e t ha t Te i s connec t ed , t he re fo re Tp mus t be a t r ee .

S i n c e w e c o n s i d e r r o o t e d t r ee s , Tp has a roo t . Le t t [ i ] be t he roo t o f Tp ; i t is c l ea r t ha t w e

c a n o b t a i n T p b y r e m o v i n g s o m e s u b t re e s , s a y S ( T [ i ] ) , f r o m T[i] s ince Te i s conne c t ed .

H e n c e , m i n i D r ( P , T [ i ] ) w i l l g i v e u s t h e d is t a n c e b e t w e e n P a n d Tp.

I n t h e f o l lo w i n g w e u s e ni to r e p r e s e n t t he n u m b e r o f c h i ld r e n o f t h e n o d e p[i] a n d

n j t o r e p r e s e n t t h e n u m b e r o f c h i l d re n o f th e n o d e t [ j ] .

8/8/2019 Unordered Trees

http://slidepdf.com/reader/full/unordered-trees 14/18

218 KaizhongZhang

LEM M A 9 . L e t p [ i l ] , p [ i2 ] . . . . . p[ in i] be t he ch i ld ren o f p [ i ] a nd l e t t [ j l ] , t [ j 2 ] ,

. . . . t [ jn j ] b e th e c h i ld r e n o f t [ j ] . T h e n

Dr(O, O) =-- O,

Dr (O, T f [ j ] ) = 0 ,

n i

D r ( P f [ i ], O ) = Z D r ( e [ i k ] , 0 ) ,k = l

Dr(O, T [ j l ) = 0 ,

D r(P [ i ] , O) = D r (P f [ i ] , O) -t- y ( p [ i ] --+ L ) .

P R O O F . T r i v i a l . [ ]

LEM M A 1 0 . L e t p [ i l ] , p [ i2 ] . . . . . p[ in ,] be t he ch i ld ren o f p [ i ] a nd l e t t [ j l ] , t [ j 2 ] ,

. . . . t [ jn j ] b e t h e c h i ld r e n o f t [ j ] . T h e n

D r ( P [ i ] , 0 ) ,

l y 0 ~ ~ t [ j ] ) + m i n D r ( P [ i ] , T [ j t ] ) ,I l < t <n j

D r ( P [ i ] , T [ j l ) = m i n I D r ( P [ i ] , O ) + r a i n D r ( P [ i s ] , T [ j ] ) - D r ( P [ i s ] , 0 ) ,I 1 s < n i

[ D r (P f [ i ] , T f [ j ] ) -+ -y (p [ i ] -'+ t [ j ] ) .

P R O O F . W e c o n s i d e r t h e b e s t m a p p i n g b e t w e e n P [ i ] a n d T [ j ] a f t e r p e r f o r m i n g a n

o p t i m a l r e m o v a l o f s u b tr e e s o f T [ j ] . I f th e w h o l e t r ee T [ j ] i s r e m o v e d , t h e n t h e d i s -

t a n c e s h o u l d b e D r ( P [ i ] , 0 ) . O t h e r w i s e w e h a v e t h e s a m e f o u r c a s e s a s i n L e m m a 4 .

I n c a se 1 p[ i ] i s i n t h e b e s t m a p p i n g b u t t [ j ] i s n o t . S u p p o s e t h a t p[ i ] i s m a p p e d t o

t[k] w h i c h i s a n o d e i n T [ j t ] . T h e n s ub tr ee s T [ j l ] . . . . . T [ j t - 1 ] , T [ j t+ l ] . . . . . T [ j~ j ]

s h o u l d h a v e b e e n r e m o v e d i n t h e o p ti m a l r e m o v a l o f s u b tr e es o f T [ j ] . T h e r e f o r e t h e

d i st a n ce i s y ( L - , t [ j ] ) + D r ( P [ i ] , T [ j t ] ) . T h e o t h e r t h r e e c a s e s a r e t h e s a m e a s i n

L e m m a 4 . [ ]

I n th e f o ll o w i n g l e m m a w e a g a in u s e t he c o n c e p t o f a m a x i m u m m a t c h i n g o n a

w e i g h t e d b i p a r t i t e g r a p h . H o w e v e r , w e h a v e t o m o d i f y t h e d e f i n i ti o n o f t h e b ip a r t i t e

g r a p h : Le t I = { i1 , i 2 . . . . . in ~} , J = { j l , j 2 . . . . . j n j} , a n d B = { b l , b 2 . . . . b i n } . W e ,

a g a i n , u s e B t o r e p r e s e n t e m p t y t re e s. L e t L - - I a n d R = J U B ; w e d e f in e a w e i g h t e d

b i p a r t i t e g r a p h G r ( i , j ) = ( V , E ) a s f o l l o w s :

(1 ) V = L U R .

( 2 ) E = L x R , w h e r e x s t a n d s f o r C a r t e s i a n p r o d u c t .

(3 ) w ( u , v ) = D ( T I [ u l , T 2 [ v l) i f u e I a n d v ~ J ,

w ( u , v ) = D ( T l [ u ] , O ) i f u 6 I a n d v 6 B .

W e u s e M M r ( i , j ) t o r e p r e se n t a m a x i m u m m a t c h i n g o n g r a p h G r ( i , j ) .

8/8/2019 Unordered Trees

http://slidepdf.com/reader/full/unordered-trees 15/18

A Constrained Edit Distance Between Unordered Labeled Trees 219

L E M M A 1 1 . L e t p [ i l ] , P [ i2 ] . . . . . p [ i , i] b e t h e c h il d r e n o f p [ i] a n d l e t t [ j l ] , t [ j2] ,

. . . . t [ jn j ] b e t h e c h i l d r en o f t [ j ] . T h e n

[ lminnj D r( P f [ i ] , T f [ j t] ) q - yO ~ --> t [ j t ] ) ,

II D r ( P f [ i] , O ) + l_s<_n,minD , ( P f [ i s ] , T f [ j ] ) - D ~( P f [ i j ] , 0 ) ,O r ( P f [ i ] , T f [ j ] ) = m i n //

I m i n y ( M M r ( i , j ) ) .I~MMr(i,j)

P R O O F . A g a i n w e c o n s i d e r t he b e s t m a p p i n g a f t e r p e r f o r m i n g t h e o p t i m a l r e m o v a l .

I f T f [ j ] i s r e m o v e d , t h e n t h e d i s ta n c e s h o u l d b e D r ( e l [ i ] , 0 ) . H o w e v e r , t h i s is h a n d l e d

b y m in M M r (i,j) y ( M M ~ ( i , j ) ) s i n c e y ( M M r ( i , j ) ) = D ~ ( P f[ i] , O ) w h e n M M r ( i, j ) ={( iv , bs) [ 1 < s < n i } .

S u p p o s e t h a t T f [ j] i s n o t r e m o v e d , t h e n w e h a v e f o u r c a s es a s in L e m m a 5 . C a s e 1 i s

s i m i l a r to t h e c a s e 1 i n L e m m a 5 . T h e o n l y d i f f e re n c e i s th a t w e d o n o t n e e d t o c o n s i d e r

t h e c o s t s o f i n s e r t i n g s u b t r e e s o f T f [ j ] w h i c h a r e n o t in th e m a p p i n g s i n c e t h e y s h o u l d

h a v e b e e n r e m o v e d . C a s e 2 is t h e s a m e a s c a s e 2 in L e m m a 5 . C a s e 3 is a s p e c ia l c a s e

f o r m i nM M r ( i,j) y ( M M r ( i , j ) ) , w h e r e M M r ( i, j ) c o n t a i n s o n l y o n e e l e m e n t f r o m I • J .

C a s e 4 i s s i m i l a r t o c a s e 4 o f L e m m a 5 . G r (i , j ) a n d M M r ( i, j ) ) a r e d e f in e d t o d e a l w i t h

t h is c a s e w h e r e w e d o n o t n e e d t o c o n s i d e r t h o s e s u b t r e e s o f T f [ j ] t h a t a re n o t i n t h e b e s t

m a p p i n g . [ ]

W e a g a i n r e d u c e t h e c o m p u t a t i o n o f minMMr ( i , j ) y ( M M r ( i , j ) ) t o t he m i n i m u m c o s t

f lo w p r o b l e m .

G i v e n P[ i ] a n d T [ j ] , l e t 1 = { i l , i2 . . . . . in l} a n d J = { j l , j 2 . . . . . jn ~ }, w h e r e i k,

1 < k < h i , r e p r e s e n t s t r e e P[ i k ] , a n d j r , 1 < l < n j , r e p r e s e n t s t r e e T[ j l ] .

W e c o n s t r u c t a g r a p h G r = ( V , E ) a s f o l l o w s :

v e r t e x s e t : V = { s , t , e , f } tA I U J , w h e r e s i s t h e s o u r c e , t is t h e s i n k , a n d e

r e p r e s e n t s a n e m p t y t r ee ;e d g e s e t : [ s , i k ], [ j l, f ] , [ e , f ] , [ f , t ] w i t h C O S t z e r o , [ ik , j r ] w i t h c o s t

D r ( P [ i k ] , T [ j d ) , a n d [ i k, e ] w i t h c o s t D r ( P [ i k ] , 0 ) . A l l t h e e d g e s h a v e

c a p a c i t y o n e e x c e p t [ e, f ] a n d [ f , t ] , w h o s e c a p a c i t i e s a re h i .

G r i s a n e t w o r k w i t h i n t e g e r c a p a c i t i e s , n o n n e g a t i v e c o s t s , a n d t h e m a x i m u m f l o w

f * = n i ( s e e F i g u r e 2 ) .

W e c a n p r o v e t h a t co s t ( G r ) = m i nM M r( i, j) y ( M M r ( i, j ) ) . T h e p r o o f i s s i m i l a r t o t h e

p r o o f o f L e m m a 8 .

W e a r e n o w r e a d y t o g i v e a n a l g o r i th m f o r a p p r o x i m a t e u n o r d e r e d t r ee m a t c h i n g . T h e

t i m e c o m p l e x i t y o f t h e a l g o r i t h m i s O ( I P I x I T I x d e g ( P ) • l o g 2 ( d e g (P ) + d e g ( T ) ) )s i n c e f * f o r G r i s r / i i n s t e a d o f ni -[- nj.

8/8/2019 Unordered Trees

http://slidepdf.com/reader/full/unordered-trees 16/18

2 2 0 K a i z h o n g Z h a n g

Tt i ]

s t

l , l~-'P[i, l , 19

F i g . 2 . R e d u c t i o n t o t h e m i n i m u m c o s t fl o w p r o b le m .

Input: P a n d T .

Output : D~(P , T [ i ] ) , w h e r e 1 < i < } T t.

D r ( 0 , 0 ) = 0 ;

f o r j = 1 t o I T I

D~(O,T , [ j ] ) = 0

Dr(O, T [ j ] ) = 0

f o r / = 1 t o I P [ni

o~(Pf[il, o ) = ~ D A P [ i k ] , O )

k = l

D ~(P [i], O) = Dr(Pf[i] , O) + y ( p [ i ] - -~ ; 0

f o r i = l t o I P I

f o r j : 1 t o IT ]

Dr(Pf[ i l , T f [ j ] ) = m i n

m i n Dr(Pe[ i l , ~ [ j , ] ) + y ( L - -+ t [ j t ] ) ,l<t<ny

D r( Pf[ i] , O) q- m i n O r( Pf[ is], T f [ j ] ) - D r( Pf[ is], 0 ) ,l <s<n i

m i n v ( M M ~ ( i , j ) ) .MM~(i,j)

8/8/2019 Unordered Trees

http://slidepdf.com/reader/full/unordered-trees 17/18

A Constrained Edit Distance Betwe en Unordered Labeled Trees 221

D r ( P [ i ] , T [ j ] ) = m i n

D r ( P [ i ] , O ) ,

Y 0 ~ ---> t [ j ] ) + m i n D r ( P [ i ] , T [ j t ] ) ,l<t<nj

D r ( P [ i ] , O ) q - m i n D r ( P [ i s ] : T [ j ] ) - D r ( P [ i s ] , O , ) ,l<s<ni

D r ( P f [ i ] , T f [ j ] ) + y ( p [ i l - - - > t [ j l ) .

7 . C o n c l u s i o n , M o t i v a te d b y t h e N P - c o m p l e t e [3 ], [ 19 ] a n d M A X S N P - h a r d r e su l ts

[ 1 6 ], w e h a v e d e f i n e d a c o n s t r a i n e d e d i t d i s t a n c e m e t r i c b e t w e e n u n o r d e r e d l a b e l e d t r ee s .

W e p r e s e n t a n a l g o r i th m f o r c o m p u t i n g t h is d i s t a n c e m e t r ic b a s e d o n a r e d u c t i o n t o th e

m i n i m u m c o s t m a x i m u m f lo w p r o b le m . O u r a l g o r i t h m i s g e n e r a l iz a b l e w i t h th e s a m e

c o m p l e x i t y to t h e a p p r o x i m a t e u n o r d e r e d - t r e e - m a t c h i n g p r o b l e m .

T h e w o r k p r e s e n t e d i s p a r t o f a p r o j e c t t o d e v e l o p a c o m p r e h e n s i v e t o o l f o r a p p r o x i -

m a t e t r ee p a t t e rn m a t c h i n g [ 1 4] . T h e p r o p o s e d a l g o r i t h m s a n d t h e i r i m p l e m e n t a t i o n a re

c u r r e n t l y i n t e g r a t e d i n t o t h i s t o o l.

Acknowledgments . T h e a u t h o r w o u l d l i k e t o t h a n k t h e a n o n y m o u s r e f e re e s f o r t h e ir

m a n y h e l p f u l s u g g e s t io n s .

References

[ 1] A. Arora, C. Lund, R. Motw ani, M . Sudan, and M. Szegedy, Proof verification and hardness of approx-

imation problems, Proc. 33rd IEEE Symp. on the Foundation of Computer Science, 1992, pp. 14-23.

[2] G .M . Landau and U. Vishkiu, Fast parallel and serial approximate string matching, J. Algorithms, 10(1989), 157-169.

[3] P. Kilpelainen and H. M annila, The tree inclusion problem, Proc. Internat. Joint C onf. on the Theory

and Practice o f Software Development (CAAP '91), 1991, Vol. 1, pp. 202-214.

[4] S. M asuyam a, Y. Takahashi, T. Oku yama, and S. Sasaki, On the largest comm on subgraph problem,

Algorithms an d C omputing Theory, RIM S, Kokyuroku (Kyoto University), 1990, pp. 195-201.

[5] P.H . Sellers, The theory and computation of evolutionary distances, J. Algorithms, 1 (1980), 359-373.[6] B. Shapiro and K. Zhang, Com paring multiple RNA secondary structures using tree comparisons,

Com put. Appl. B iosci., 6(4) (1990), 309-318.

[7] E Y. Shih, Object representation and recognition usin g mathem atical morpho logy model, J. System

Integration, 1 (1991), 235-256.

[8] E Y. Shih and O. R. Mitchell, Threshold decomp osition of grayscale morphology into bina~'ymorphol-

ogy, IEEE Trans. Pattern Anal. M ach. Intell., 11 (1989), 31-42 .

[9] K .C . Tai, The tree-to-tree correction problem, J. Assoc. Comput. Mach., 26 (1979), 422-433.

[10] Y. Takahash i, Y. Satoh, H. Suzuki, and S. Sasaki, Recogn ition of largest com mon structural fragment

amo ng a variety of chemical structures, Anal. Sci., 3 (1987), 23-28.

[ 11] E. Tanaka and K. Tanaka, The tree-to-tree editing problem, Internat. J. Pattern Recog. Artificial Intell.,

2(2) (1988), 221-240.

[12] R. E. Tar jan , Data Structures and Network Algorithms, CBMS-NSF Regional Conference Series inApplied Mathematics, CBMS, W ashington, ~C , 1983.

[13] E. Ukkonen, Find ing approximate pattern~in str ings, J. Algorithms, 6 (1985), 132-137.

[14 ] J. T. L . Wang, Kaizhong Zhang, Karpjoo Jeong, and D. Shasha, ATB E: a system for approximate tree

matching, IEEE Trans. Knowledge Data Engrg, 6(4) (1994), 559-571.

8/8/2019 Unordered Trees

http://slidepdf.com/reader/full/unordered-trees 18/18

2 2 2 K a i z h o n g Z h a n g

[15] Ka izho ng Zhang , Algor i thm s fo r the Cons t r a ined Edi t ing Dis tance Be tw een Orde red Labe led Trees and

Re la ted Prob lem s , Technica l Repor t No . 361 , Depa r tm ent o f Com pute r Sc ience , U nive r s i ty o f Wes te rn

Ontario, 1993.

[16] Ka izho ng Zhang and Tao Jiang , Some M AX SNP -ha rd r e su l t s conce rn ing unorde red labe led t ree s ,

Inform. P rocess. Lett., 49 (1994) , 249-254.[17] Ka izho ng Zhang and D. Shasha , S imple f a s t a lgor i thms fo r the ed i t ing d is tance be twe en t ree s and re la ted

prob lems , SIAMJ. Comput . , 18(6 ) (1989), 12 45-1262.

[18] Ka izho ng Zhang , D . Shasha , and J . Wang , Approx im a te t r ee ma tch ing in the p re sence o f va r iab le leng th

do n ' t c a res , J . Algorithms, 16 (1994) , 33-66.

[19] Ka izho ng Zhan g , R. Sta tman and D. Shasha , On the ed i t ing d is tance be twe en unorde red labe led t r ee s ,

Inform. P rocess. Lett., 42 (1992) , 133-13 9.