6600089 handbook of exact string matching algorithms

Upload: endang-santoso

Post on 07-Apr-2018

229 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    1/220

    H a n d b o o k o f E x a c t S t r i n g - M a t c h i n g

    A l g o r i t h m s

    C h r i s t i a n C h a r r a s T h i e r r y L e c r o q

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    2/220

    2

    1 I n t r o d u c t i o n 1 1

    1 . 1 F r o m l e f t t o r i g h t 1 2

    1 . 2 F r o m r i g h t t o l e f t 1 3

    1 . 3 I n a s p e c i c o r d e r 1 3

    1 . 4 I n a n y o r d e r 1 4

    1 . 5 C o n v e n t i o n s 1 4

    D e n i t i o n s 1 4

    I m p l e m e n t a t i o n s 1 5

    2 B r u t e f o r c e a l g o r i t h m 1 9

    2 . 1 M a i n f e a t u r e s 1 9

    2 . 2 D e s c r i p t i o n 1 9

    2 . 3 T h e C c o d e 2 0

    2 . 4 T h e e x a m p l e 2 0

    3 S e a r c h w i t h a n a u t o m a t o n 2 5

    3 . 1 M a i n f e a t u r e s 2 5

    3 . 2 D e s c r i p t i o n 2 5

    3 . 3 T h e C c o d e 2 6

    3 . 4 T h e e x a m p l e 2 7

    3 . 5 R e f e r e n c e s 3 0

    4 K a r p - R a b i n a l g o r i t h m 3 1

    4 . 1 M a i n f e a t u r e s 3 1

    4 . 2 D e s c r i p t i o n 3 1

    4 . 3 T h e C c o d e 3 2

    4 . 4 T h e e x a m p l e 3 3

    4 . 5 R e f e r e n c e s 3 5

    5 S h i f t O r a l g o r i t h m 3 7

    5 . 1 M a i n f e a t u r e s 3 7

    5 . 2 D e s c r i p t i o n 3 7

    5 . 3 T h e C c o d e 3 8

    5 . 4 T h e e x a m p l e 3 9

    5 . 5 R e f e r e n c e s 4 0

    6 M o r r i s - P r a t t a l g o r i t h m 4 1

    6 . 1 M a i n F e a t u r e s 4 1

    6 . 2 D e s c r i p t i o n 4 1

    6 . 3 T h e C c o d e 4 2

    6 . 4 T h e e x a m p l e 4 3

    6 . 5 R e f e r e n c e s 4 4

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    3/220

    3

    7 K n u t h - M o r r i s - P r a t t a l g o r i t h m 4 7

    7 . 1 M a i n F e a t u r e s 4 7

    7 . 2 D e s c r i p t i o n 4 7

    7 . 3 T h e C c o d e 4 8

    7 . 4 T h e e x a m p l e 4 9

    7 . 5 R e f e r e n c e s 5 0

    8 S i m o n a l g o r i t h m 5 3

    8 . 1 M a i n f e a t u r e s 5 3

    8 . 2 D e s c r i p t i o n 5 3

    8 . 3 T h e C c o d e 5 4

    8 . 4 T h e e x a m p l e 5 6

    8 . 5 R e f e r e n c e s 5 9

    9 C o l u s s i a l g o r i t h m 6 1

    9 . 1 M a i n f e a t u r e s 6 1

    9 . 2 D e s c r i p t i o n 6 1

    9 . 3 T h e C c o d e 6 3

    9 . 4 T h e e x a m p l e 6 6

    9 . 5 R e f e r e n c e s 6 7

    1 0 G a l i l - G i a n c a r l o a l g o r i t h m 6 9

    1 0 . 1 M a i n f e a t u r e s 6 9

    1 0 . 2 D e s c r i p t i o n 6 9

    1 0 . 3 T h e C c o d e 7 0

    1 0 . 4 T h e e x a m p l e 7 1

    1 0 . 5 R e f e r e n c e s 7 3

    1 1 A p o s t o l i c o - C r o c h e m o r e a l g o r i t h m 7 5

    1 1 . 1 M a i n f e a t u r e s 7 5

    1 1 . 2 D e s c r i p t i o n 7 5

    1 1 . 3 T h e C c o d e 7 6

    1 1 . 4 T h e e x a m p l e 7 7

    1 1 . 5 R e f e r e n c e s 7 9

    1 2 N o t S o N a i v e a l g o r i t h m 8 1

    1 2 . 1 M a i n f e a t u r e s 8 1

    1 2 . 2 D e s c r i p t i o n 8 1

    1 2 . 3 T h e C c o d e 8 1

    1 2 . 4 T h e e x a m p l e 8 2

    1 2 . 5 R e f e r e n c e s 8 5

    1 3 F o r w a r d D a w g M a t c h i n g a l g o r i t h m 8 7

    1 3 . 1 M a i n F e a t u r e s 8 7

    1 3 . 2 D e s c r i p t i o n 8 7

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    4/220

    4

    1 3 . 3 T h e C c o d e 8 8

    1 3 . 4 T h e e x a m p l e 8 9

    1 3 . 5 R e f e r e n c e s 9 0

    1 4 B o y e r - M o o r e a l g o r i t h m 9 1

    1 4 . 1 M a i n F e a t u r e s 9 1

    1 4 . 2 D e s c r i p t i o n 9 1

    1 4 . 3 T h e C c o d e 9 3

    1 4 . 4 T h e e x a m p l e 9 5

    1 4 . 5 R e f e r e n c e s 9 6

    1 5 T u r b o - B M a l g o r i t h m 9 9

    1 5 . 1 M a i n F e a t u r e s 9 9

    1 5 . 2 D e s c r i p t i o n 9 9

    1 5 . 3 T h e C c o d e 1 0 0

    1 5 . 4 T h e e x a m p l e 1 0 1

    1 5 . 5 R e f e r e n c e s 1 0 3

    1 6 A p o s t o l i c o - G i a n c a r l o a l g o r i t h m 1 0 5

    1 6 . 1 M a i n F e a t u r e s 1 0 5

    1 6 . 2 D e s c r i p t i o n 1 0 5

    1 6 . 3 T h e C c o d e 1 0 7

    1 6 . 4 T h e e x a m p l e 1 0 8

    1 6 . 5 R e f e r e n c e s 1 1 0

    1 7 R e v e r s e C o l u s s i a l g o r i t h m 1 1 1

    1 7 . 1 M a i n f e a t u r e s 1 1 1

    1 7 . 2 D e s c r i p t i o n 1 1 1

    1 7 . 3 T h e C c o d e 1 1 2

    1 7 . 4 T h e e x a m p l e 1 1 4

    1 7 . 5 R e f e r e n c e s 1 1 6

    1 8 H o r s p o o l a l g o r i t h m 1 1 7

    1 8 . 1 M a i n F e a t u r e s 1 1 7

    1 8 . 2 D e s c r i p t i o n 1 1 7

    1 8 . 3 T h e C c o d e 1 1 8

    1 8 . 4 T h e e x a m p l e 1 1 8

    1 8 . 5 R e f e r e n c e s 1 1 9

    1 9 Q u i c k S e a r c h a l g o r i t h m 1 2 1

    1 9 . 1 M a i n F e a t u r e s 1 2 1

    1 9 . 2 D e s c r i p t i o n 1 2 1

    1 9 . 3 T h e C c o d e 1 2 2

    1 9 . 4 T h e e x a m p l e 1 2 2

    1 9 . 5 R e f e r e n c e s 1 2 3

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    5/220

    5

    2 0 T u n e d B o y e r - M o o r e a l g o r i t h m 1 2 5

    2 0 . 1 M a i n F e a t u r e s 1 2 5

    2 0 . 2 D e s c r i p t i o n 1 2 5

    2 0 . 3 T h e C c o d e 1 2 6

    2 0 . 4 T h e e x a m p l e 1 2 6

    2 0 . 5 R e f e r e n c e s 1 2 8

    2 1 Z h u - T a k a o k a a l g o r i t h m 1 2 9

    2 1 . 1 M a i n f e a t u r e s 1 2 9

    2 1 . 2 D e s c r i p t i o n 1 2 9

    2 1 . 3 T h e C c o d e 1 3 0

    2 1 . 4 T h e e x a m p l e 1 3 1

    2 1 . 5 R e f e r e n c e s 1 3 2

    2 2 B e r r y - R a v i n d r a n a l g o r i t h m 1 3 3

    2 2 . 1 M a i n f e a t u r e s 1 3 3

    2 2 . 2 D e s c r i p t i o n 1 3 3

    2 2 . 3 T h e C c o d e 1 3 4

    2 2 . 4 T h e e x a m p l e 1 3 4

    2 2 . 5 R e f e r e n c e s 1 3 6

    2 3 S m i t h a l g o r i t h m 1 3 7

    2 3 . 1 M a i n f e a t u r e s 1 3 7

    2 3 . 2 D e s c r i p t i o n 1 3 7

    2 3 . 3 T h e C c o d e 1 3 8

    2 3 . 4 T h e e x a m p l e 1 3 8

    2 3 . 5 R e f e r e n c e s 1 3 9

    2 4 R a i t a a l g o r i t h m 1 4 1

    2 4 . 1 M a i n f e a t u r e s 1 4 1

    2 4 . 2 D e s c r i p t i o n 1 4 1

    2 4 . 3 T h e C c o d e 1 4 2

    2 4 . 4 T h e e x a m p l e 1 4 2

    2 4 . 5 R e f e r e n c e s 1 4 4

    2 5 R e v e r s e F a c t o r a l g o r i t h m 1 4 5

    2 5 . 1 M a i n F e a t u r e s 1 4 5

    2 5 . 2 D e s c r i p t i o n 1 4 5

    2 5 . 3 T h e C c o d e 1 4 6

    2 5 . 4 T h e e x a m p l e 1 4 9

    2 5 . 5 R e f e r e n c e s 1 5 0

    2 6 T u r b o R e v e r s e F a c t o r a l g o r i t h m 1 5 1

    2 6 . 1 M a i n F e a t u r e s 1 5 1

    2 6 . 2 D e s c r i p t i o n 1 5 1

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    6/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    7/220

    7

    3 3 S k i p S e a r c h a l g o r i t h m 1 9 5

    3 3 . 1 M a i n f e a t u r e s 1 9 5

    3 3 . 2 D e s c r i p t i o n 1 9 5

    3 3 . 3 T h e C c o d e 1 9 6

    3 3 . 4 T h e e x a m p l e 1 9 7

    3 3 . 5 R e f e r e n c e s 1 9 8

    3 4 K m p S k i p S e a r c h a l g o r i t h m 1 9 9

    3 4 . 1 M a i n f e a t u r e s 1 9 9

    3 4 . 2 D e s c r i p t i o n 1 9 9

    3 4 . 3 T h e C c o d e 2 0 0

    3 4 . 4 T h e e x a m p l e 2 0 2

    3 4 . 5 R e f e r e n c e s 2 0 3

    3 5 A l p h a S k i p S e a r c h a l g o r i t h m 2 0 5

    3 5 . 1 M a i n f e a t u r e s 2 0 5

    3 5 . 2 D e s c r i p t i o n 2 0 5

    3 5 . 3 T h e C c o d e 2 0 6

    3 5 . 4 T h e e x a m p l e 2 0 8

    3 5 . 5 R e f e r e n c e s 2 0 9

    A E x a m p l e o f g r a p h i m p l e m e n t a t i o n 2 1 1

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    8/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    9/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    10/220

    1 0 L I S T O F F I G U R E S

    1 6 . 3 C a s e 3 , k

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    11/220

    1 I n t r o d u c t i o n

    S t r i n g - m a t c h i n g i s a v e r y i m p o r t a n t s u b j e c t i n t h e w i d e r d o m a i n o f

    t e x t p r o c e s s i n g . S t r i n g - m a t c h i n g a l g o r i t h m s a r e b a s i c c o m p o n e n t s u s e d

    i n i m p l e m e n t a t i o n s o f p r a c t i c a l s o f t w a r e s e x i s t i n g u n d e r m o s t o p e r a t i n g

    s y s t e m s . M o r e o v e r , t h e y e m p h a s i z e p r o g r a m m i n g m e t h o d s t h a t s e r v e a s

    p a r a d i g m s i n o t h e r e l d s o f c o m p u t e r s c i e n c e ( s y s t e m o r s o f t w a r e d e s i g n ) .

    F i n a l l y , t h e y a l s o p l a y a n i m p o r t a n t r o l e i n t h e o r e t i c a l c o m p u t e r s c i e n c e

    b y p r o v i d i n g c h a l l e n g i n g p r o b l e m s .

    A l t h o u g h d a t a a r e m e m o r i z e d i n v a r i o u s w a y s , t e x t r e m a i n s t h e m a i n

    f o r m t o e x c h a n g e i n f o r m a t i o n . T h i s i s p a r t i c u l a r l y e v i d e n t i n l i t e r a t u r e

    o r l i n g u i s t i c s w h e r e d a t a a r e c o m p o s e d o f h u g e c o r p u s a n d d i c t i o n a r i e s .

    T h i s a p p l y a s w e l l t o c o m p u t e r s c i e n c e w h e r e a l a r g e a m o u n t o f d a t a a r e

    s t o r e d i n l i n e a r l e s . A n d t h i s i s a l s o t h e c a s e , f o r i n s t a n c e , i n m o l e c -

    u l a r b i o l o g y b e c a u s e b i o l o g i c a l m o l e c u l e s c a n o f t e n b e a p p r o x i m a t e d a s

    s e q u e n c e s o f n u c l e o t i d e s o r a m i n o a c i d s . F u r t h e r m o r e , t h e q u a n t i t y o f

    a v a i l a b l e d a t a i n t h e s e e l d s t e n d t o d o u b l e e v e r y e i g h t e e n m o n t h s . T h i s

    i s t h e r e a s o n w h y a l g o r i t h m s s h o u l d b e e c i e n t e v e n i f t h e s p e e d a n d

    c a p a c i t y o f s t o r a g e o f c o m p u t e r s i n c r e a s e r e g u l a r l y .

    S t r i n g - m a t c h i n g c o n s i s t s i n n d i n g o n e , o r m o r e g e n e r a l l y , a l l t h e o c -

    c u r r e n c e s o f a s t r i n g ( m o r e g e n e r a l l y c a l l e d a p a t t e r n ) i n a t e x t . A l l

    t h e a l g o r i t h m s i n t h i s b o o k o u t p u t a l l o c c u r r e n c e s o f t h e p a t t e r n i n t h e

    t e x t . T h e p a t t e r n i s d e n o t e d b y x = x 0 m ; 1 i t s l e n g t h i s e q u a l

    t o m . T h e t e x t i s d e n o t e d b y y = y 0 n ; 1 i t s l e n g t h i s e q u a l t o n

    B o t h s t r i n g s a r e b u i l d o v e r a n i t e s e t o f c h a r a c t e r c a l l e d a n a l p h a b e t

    d e n o t e d b y w i t h s i z e i s e q u a l t o

    A p p l i c a t i o n s r e q u i r e t w o k i n d s o f s o l u t i o n d e p e n d i n g o n w h i c h s t r i n g ,

    t h e p a t t e r n o r t h e t e x t , i s g i v e n r s t . A l g o r i t h m s b a s e d o n t h e u s e o f

    a u t o m a t a o r c o m b i n a t o r i a l p r o p e r t i e s o f s t r i n g s a r e c o m m o n l y i m p l e -

    m e n t e d t o p r e p r o c e s s t h e p a t t e r n a n d s o l v e t h e r s t k i n d o f p r o b l e m .

    T h e n o t i o n o f i n d e x e s r e a l i z e d b y t r e e s o r a u t o m a t a i s u s e d i n t h e s e c -

    o n d k i n d o f s o l u t i o n s . T h i s b o o k w i l l o n l y i n v e s t i g a t e a l g o r i t h m s o f t h e

    r s t k i n d .

    S t r i n g - m a t c h i n g a l g o r i t h m s o f t h e p r e s e n t b o o k w o r k a s f o l l o w s . T h e y

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    12/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    13/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    14/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    15/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    16/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    17/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    18/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    19/220

    2 B r u t e f o r c e a l g o r i t h m

    2 . 1 M a i n f e a t u r e s

    n o p r e p r o c e s s i n g p h a s e

    c o n s t a n t e x t r a s p a c e n e e d e d

    a l w a y s s h i f t s t h e w i n d o w b y e x a c t l y 1 p o s i t i o n t o t h e r i g h t

    c o m p a r i s o n s c a n b e d o n e i n a n y o r d e r

    s e a r c h i n g p h a s e i n O ( m n ) t i m e c o m p l e x i t y

    2 n e x p e c t e d t e x t c h a r a c t e r c o m p a r i s o n s .

    2 . 2 D e s c r i p t i o n

    T h e b r u t e f o r c e a l g o r i t h m c o n s i s t s i n c h e c k i n g , a t a l l p o s i t i o n s i n t h e

    t e x t b e t w e e n 0 a n d n ; m , w h e t h e r a n o c c u r r e n c e o f t h e p a t t e r n s t a r t s

    t h e r e o r n o t . T h e n , a f t e r e a c h a t t e m p t , i t s h i f t s t h e p a t t e r n b y e x a c t l y

    o n e p o s i t i o n t o t h e r i g h t .

    T h e b r u t e f o r c e a l g o r i t h m r e q u i r e s n o p r e p r o c e s s i n g p h a s e , a n d a c o n -

    s t a n t e x t r a s p a c e i n a d d i t i o n t o t h e p a t t e r n a n d t h e t e x t . D u r i n g t h e

    s e a r c h i n g p h a s e t h e t e x t c h a r a c t e r c o m p a r i s o n s c a n b e d o n e i n a n y o r -

    d e r . T h e t i m e c o m p l e x i t y o f t h i s s e a r c h i n g p h a s e i s O ( m n ) ( w h e n

    s e a r c h i n g f o r a

    m 1

    b i n a

    n

    f o r i n s t a n c e ) . T h e e x p e c t e d n u m b e r o f t e x t

    c h a r a c t e r c o m p a r i s o n s i s 2 n

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    20/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    21/220

    2 . 4 T h e e x a m p l e 2 1

    S e c o n d a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1

    x G C A G A G A G

    S h i f t b y 1

    T h i r d a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1

    x G C A G A G A G

    S h i f t b y 1

    F o u r t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1

    x G C A G A G A G

    S h i f t b y 1

    F i f t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1

    x G C A G A G A G

    S h i f t b y 1

    S i x t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1 2 3 4 5 6 7 8

    x G C A G A G A G

    S h i f t b y 1

    S e v e n t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1

    x G C A G A G A G

    S h i f t b y 1

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    22/220

    2 2 C h a p t e r 2 B r u t e f o r c e a l g o r i t h m

    E i g h t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1

    x G C A G A G A G

    S h i f t b y 1

    N i n t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1 2

    x G C A G A G A G

    S h i f t b y 1

    T e n t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1

    x G C A G A G A G

    S h i f t b y 1

    E l e v e n t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1 2

    x G C A G A G A G

    S h i f t b y 1

    T w e l f t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1

    x G C A G A G A G

    S h i f t b y 1

    T h i r t e e n t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1 2

    x G C A G A G A G

    S h i f t b y 1

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    23/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    24/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    25/220

    3 S e a r c h w i t h a n a u t o m a t o n

    3 . 1 M a i n f e a t u r e s

    b u i l d s t h e m i n i m a l D e t e r m i n i s t i c F i n i t e A u t o m a t o n r e c o g n i z i n g t h e

    l a n g u a g e

    x

    e x t r a s p a c e i n O ( m ) i f t h e a u t o m a t o n i s s t o r e d i n a d i r e c t a c c e s s

    t a b l e

    p r e p r o c e s s i n g p h a s e i n O ( m ) t i m e c o m p l e x i t y

    s e a r c h i n g p h a s e i n O ( n ) t i m e c o m p l e x i t y i f t h e a u t o m a t o n i s s t o r e d

    i n a d i r e c t a c c e s s t a b l e , O ( n l o g ) o t h e r w i s e .

    3 . 2 D e s c r i p t i o n

    S e a r c h i n g a w o r d x w i t h a n a u t o m a t o n c o n s i s t s r s t i n b u i l d i n g t h e

    m i n i m a l D e t e r m i n i s t i c F i n i t e A u t o m a t o n ( D F A ) A ( x ) r e c o g n i z i n g t h e

    l a n g u a g e

    x

    T h e D F A A ( x ) = ( Q q

    0

    T E ) r e c o g n i z i n g t h e l a n g u a g e

    x i s d e n e d

    a s f o l l o w s :

    Q i s t h e s e t o f a l l t h e p r e x e s o f x

    Q = f " x 0 x 0 1 : : : x 0 m ; 2 x g

    q

    0

    = "

    T = f x g

    f o r q 2 Q ( q i s a p r e x o f x ) a n d a 2 ( q a q a ) 2 E i f a n d o n l y

    i f q a i s a l s o a p r e x o f x , o t h e r w i s e ( q a p ) 2 E s u c h t h a t p i s t h e

    l o n g e s t s u x o f q a w h i c h i s a p r e x o f x

    T h e D F A A ( x ) c a n b e c o n s t r u c t e d i n O ( m + ) t i m e a n d O ( m )

    s p a c e .

    O n c e t h e D F A A ( x ) i s b u i l d , s e a r c h i n g f o r a w o r d x i n a t e x t y c o n s i s t s

    i n p a r s i n g t h e t e x t y w i t h t h e D F A A ( x ) b e g i n n i n g w i t h t h e i n i t i a l s t a t e

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    26/220

    2 6 C h a p t e r 3 S e a r c h w i t h a n a u t o m a t o n

    q

    0

    . E a c h t i m e t h e t e r m i n a l s t a t e i s e n c o u n t e r e d a n o c c u r r e n c e o f x i s

    r e p o r t e d .

    T h e s e a r c h i n g p h a s e c a n b e p e r f o r m e d i n O ( n ) t i m e i f t h e a u t o m a t o n

    i s s t o r e d i n a d i r e c t a c c e s s t a b l e , i n O ( n l o g ) o t h e r w i s e .

    3 . 3 T h e C c o d e

    v o i d p r e A u t ( c h a r * x , i n t m , G r a p h a u t ) {

    i n t i , s t a t e , t a r g e t , o l d T a r g e t

    f o r ( s t a t e = g e t I n i t i a l ( a u t ) , i = 0 i < m + + i ) {

    o l d T a r g e t = g e t T a r g e t ( a u t , s t a t e , x i ] )

    t a r g e t = n e w V e r t e x ( a u t )

    s e t T a r g e t ( a u t , s t a t e , x i ] , t a r g e t )

    c o p y V e r t e x ( a u t , t a r g e t , o l d T a r g e t )

    s t a t e = t a r g e t

    }

    s e t T e r m i n a l ( a u t , s t a t e )

    }

    v o i d A U T ( c h a r * x , i n t m , c h a r * y , i n t n ) {

    i n t j , s t a t e

    G r a p h a u t

    / * P r e p r o c e s s i n g * /

    a u t = n e w A u t o m a t o n ( m + 1 , ( m + 1 ) * A S I Z E )

    p r e A u t ( x , m , a u t )

    / * S e a r c h i n g * /

    f o r ( s t a t e = g e t I n i t i a l ( a u t ) , j = 0 j < n + + j ) {

    s t a t e = g e t T a r g e t ( a u t , s t a t e , y j ] )

    i f ( i s T e r m i n a l ( a u t , s t a t e ) )

    O U T P U T ( j - m + 1 )

    }

    }

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    27/220

    3 . 4 T h e e x a m p l e 2 7

    3 . 4 T h e e x a m p l e

    = f A C G T g

    Q = f " G G C G C A G C A G G C A G A G C A G A G G C A G A G A G C A G A G A G g

    q

    0

    = "

    T = f G C A G A G A G g

    0 1 2 3 4 5 6 7 8

    G C A G A G A G

    G

    G

    G

    G

    G

    C

    C

    C

    T h e s t a t e s a r e l a b e l l e d b y t h e l e n g t h o f t h e p r e x t h e y a r e a s s o c i a t e d

    w i t h . M i s s i n g t r a n s i t i o n s a r e l e a d i n g t o t h e i n i t i a l s t a t e 0

    S e a r c h i n g p h a s e

    T h e i n i t i a l s t a t e i s 0 .

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    28/220

    2 8 C h a p t e r 3 S e a r c h w i t h a n a u t o m a t o n

    y G C A T C G C A G A G A G T A T A C A G T A C G

    2

    y G C A T C G C A G A G A G T A T A C A G T A C G

    3

    y G C A T C G C A G A G A G T A T A C A G T A C G

    0

    y G C A T C G C A G A G A G T A T A C A G T A C G

    0

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1

    y G C A T C G C A G A G A G T A T A C A G T A C G

    2

    y G C A T C G C A G A G A G T A T A C A G T A C G

    3

    y G C A T C G C A G A G A G T A T A C A G T A C G

    4

    y G C A T C G C A G A G A G T A T A C A G T A C G

    5

    y G C A T C G C A G A G A G T A T A C A G T A C G

    6

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    29/220

    3 . 4 T h e e x a m p l e 2 9

    y G C A T C G C A G A G A G T A T A C A G T A C G

    7

    y G C A T C G C A G A G A G T A T A C A G T A C G

    8

    y G C A T C G C A G A G A G T A T A C A G T A C G

    0

    y G C A T C G C A G A G A G T A T A C A G T A C G

    0

    y G C A T C G C A G A G A G T A T A C A G T A C G

    0

    y G C A T C G C A G A G A G T A T A C A G T A C G

    0

    y G C A T C G C A G A G A G T A T A C A G T A C G

    0

    y G C A T C G C A G A G A G T A T A C A G T A C G

    0

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1

    y G C A T C G C A G A G A G T A T A C A G T A C G

    0

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    30/220

    3 0 C h a p t e r 3 S e a r c h w i t h a n a u t o m a t o n

    y G C A T C G C A G A G A G T A T A C A G T A C G

    0

    y G C A T C G C A G A G A G T A T A C A G T A C G

    0

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1

    T h e s e a r c h b y a u t o m a t o n p e r f o r m s e x a c t l y 2 4 t e x t c h a r a c t e r i n s p e c -

    t i o n s o n t h e e x a m p l e .

    3 . 5 R e f e r e n c e s

    C o r m e n , T . H . L e i s e r s o n , C . E . R i v e s t , R . L . , 1 9 9 0 , I n t r o d u c -

    t i o n t o A l g o r i t h m s , C h a p t e r 3 4 , p p 8 5 3 8 8 5 , M I T P r e s s .

    C r o c h e m o r e , M . , 1 9 9 7 , O - l i n e s e r i a l e x a c t s t r i n g s e a r c h i n g , i n

    P a t t e r n M a t c h i n g A l g o r i t h m s , A . A p o s t o l i c o a n d Z . G a l i l e d . , C h a p -

    t e r 1 , p p 1 5 3 , O x f o r d U n i v e r s i t y P r e s s .

    C r o c h e m o r e , M . H a n c a r t , C . , 1 9 9 7 , A u t o m a t a f o r M a t c h -

    i n g P a t t e r n s , i n H a n d b o o k o f F o r m a l L a n g u a g e s , V o l u m e 2 , L i n -

    e a r M o d e l i n g : B a c k g r o u n d a n d A p p l i c a t i o n , G . R o z e n b e r g a n d A .

    S a l o m a a e d . , C h a p t e r 9 , p p 3 9 9 4 6 2 , S p r i n g e r - V e r l a g , B e r l i n .

    G o n n e t , G . H . B a e z a - Y a t e s , R . A . , 1 9 9 1 , H a n d b o o k o f A l g o -

    r i t h m s a n d D a t a S t r u c t u r e s i n P a s c a l a n d C , 2 n d E d i t i o n , C h a p t e r

    7 , p p . 2 5 1 2 8 8 , A d d i s o n - W e s l e y P u b l i s h i n g C o m p a n y .

    H a n c a r t , C . , 1 9 9 3 , A n a l y s e e x a c t e e t e n m o y e n n e d ' a l g o r i t h m e s

    d e r e c h e r c h e d ' u n m o t i f d a n s u n t e x t e , T h s e d e d o c t o r a t d e l ' U n i -

    v e r s i t d e P a r i s 7 , F r a n c e .

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    31/220

    4 K a r p - R a b i n a l g o r i t h m

    4 . 1 M a i n f e a t u r e s

    u s e s a n h a s h i n g f u n c t i o n

    p r e p r o c e s s i n g p h a s e i n O ( m ) t i m e c o m p l e x i t y a n d c o n s t a n t s p a c e

    s e a r c h i n g p h a s e i n O ( m n ) t i m e c o m p l e x i t y

    O ( m + n ) e x p e c t e d r u n n i n g t i m e .

    4 . 2 D e s c r i p t i o n

    H a s h i n g p r o v i d e s a s i m p l e m e t h o d t o a v o i d a q u a d r a t i c n u m b e r o f c h a r -

    a c t e r c o m p a r i s o n s i n m o s t p r a c t i c a l s i t u a t i o n s . I n s t e a d o f c h e c k i n g a t

    e a c h p o s i t i o n o f t h e t e x t i f t h e p a t t e r n o c c u r s , i t s e e m s t o b e m o r e e -

    c i e n t t o c h e c k o n l y i f t h e c o n t e n t s o f t h e w i n d o w l o o k s l i k e t h e p a t t e r n .

    I n o r d e r t o c h e c k t h e r e s e m b l a n c e b e t w e e n t h e s e t w o w o r d s a n h a s h i n g

    f u n c t i o n i s u s e d . T o b e h e l p f u l f o r t h e s t r i n g m a t c h i n g p r o b l e m a n h a s h -

    i n g f u n c t i o n h a s h s h o u l d h a v e t h e f o l l o w i n g p r o p e r t i e s :

    e c i e n t l y c o m p u t a b l e

    h i g h l y d i s c r i m i n a t i n g f o r s t r i n g s

    h a s h ( y j + 1 j + m ) m u s t b e e a s i l y c o m p u t a b l e f r o m h a s h ( y j : : j +

    m ; 1 ) a n d y j + m

    h a s h ( y j + 1 j + m ) = r e h a s h ( y j y j + m

    h a s h ( y j : : j + m ; 1 ] ) )

    F o r a w o r d w o f l e n g t h m l e t h a s h ( w ) b e d e n e d a s f o l l o w s :

    h a s h ( w 0 m ; 1 ] ) = ( w 0 2

    m 1

    + w 1 2

    m 2

    + +

    w m ; 1 2

    0

    ) m o d q

    w h e r e q i s a l a r g e n u m b e r . T h e n ,

    r e h a s h ( a b h ) = ( ( h ; a 2

    m 1

    ) 2 + b ) m o d q

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    32/220

    3 2 C h a p t e r 4 K a r p - R a b i n a l g o r i t h m

    T h e p r e p r o c e s s i n g p h a s e o f t h e K a r p - R a b i n a l g o r i t h m c o n s i s t s i n c o m -

    p u t i n g h a s h ( x ) . I t c a n b e d o n e i n c o n s t a n t s p a c e a n d O ( m ) t i m e .

    D u r i n g t h e s e a r c h i n g p h a s e , i t i s e n o u g h t o c o m p a r e h a s h ( x ) w i t h

    h a s h ( y j : : j + m ; 1 ) f o r 0 j n ; m . I f a n e q u a l i t y i s f o u n d , i t i s

    s t i l l n e c e s s a r y t o c h e c k t h e e q u a l i t y x = y j : : j + m ; 1 c h a r a c t e r b y

    c h a r a c t e r .

    T h e t i m e c o m p l e x i t y o f t h e s e a r c h i n g p h a s e o f t h e K a r p - R a b i n a l -

    g o r i t h m i s O ( m n ) ( w h e n s e a r c h i n g f o r a

    m

    i n a

    n

    f o r i n s t a n c e ) . I t s

    e x p e c t e d n u m b e r o f t e x t c h a r a c t e r c o m p a r i s o n s i s O ( m + n )

    4 . 3 T h e C c o d e

    I n t h e f o l l o w i n g f u n c t i o n K R a l l t h e m u l t i p l i c a t i o n s b y 2 a r e i m p l e m e n t e d

    b y s h i f t s . F u r t h e r m o r e , t h e c o m p u t a t i o n o f t h e m o d u l u s f u n c t i o n i s

    a v o i d e d b y u s i n g t h e i m p l i c i t m o d u l a r a r i t h m e t i c g i v e n b y t h e h a r d w a r e

    t h a t f o r g e t s c a r r i e s i n i n t e g e r o p e r a t i o n s . S o , q i s c h o s e n a s t h e m a x i m u m

    v a l u e f o r a n i n t e g e r .

    # d e f i n e R E H A S H ( a , b , h ) ( ( ( ( h ) - ( a ) * d ) < < 1 ) + ( b ) )

    v o i d K R ( c h a r * x , i n t m , c h a r * y , i n t n ) {

    i n t d , h x , h y , i , j

    / * P r e p r o c e s s i n g * /

    / * c o m p u t e s d = 2 ^ ( m - 1 ) w i t h

    t h e l e f t - s h i f t o p e r a t o r * /

    f o r ( d = i = 1 i < m + + i )

    d = ( d < < 1 )

    f o r ( h y = h x = i = 0 i < m + + i ) {

    h x = ( ( h x < < 1 ) + x i ] )

    h y = ( ( h y < < 1 ) + y i ] )

    }

    / * S e a r c h i n g * /

    j = 0

    w h i l e ( j < = n - m ) {

    i f ( h x = = h y & & m e m c m p ( x , y + j , m ) = = 0 )

    O U T P U T ( j )

    + + j

    h y = R E H A S H ( y j ] , y j + m ] , h y )

    }

    }

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    33/220

    4 . 4 T h e e x a m p l e 3 3

    4 . 4 T h e e x a m p l e

    h a s h ( G C A G A G A G ) = 1 7 5 9 7

    S e a r c h i n g p h a s e

    F i r s t a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    x G C A G A G A G

    h a s h ( y 0 7 ] ) = 1 7 8 1 9

    S e c o n d a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    x G C A G A G A G

    h a s h ( y 1 8 ] ) = 1 7 5 3 3

    T h i r d a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    x G C A G A G A G

    h a s h ( y 2 9 ] ) = 1 7 9 7 9

    F o u r t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    x G C A G A G A G

    h a s h ( y 3 1 0 ] ) = 1 9 3 8 9

    F i f t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    x G C A G A G A G

    h a s h ( y 4 1 1 ] ) = 1 7 3 3 9

    S i x t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1 2 3 4 5 6 7 8

    x G C A G A G A G

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    34/220

    3 4 C h a p t e r 4 K a r p - R a b i n a l g o r i t h m

    h a s h ( y 5 1 2 ] ) = 1 7 5 9 7 = h a s h ( x ) t h u s 8 c h a r a c t e r c o m p a r i s o n s a r e

    n e c e s s a r y t o b e s u r e t h a t a n o c c u r r e n c e o f t h e p a t t e r n h a s b e e n f o u n d .

    S e v e n t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    x G C A G A G A G

    h a s h ( y 6 1 3 ] ) = 1 7 1 0 2

    E i g h t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    x G C A G A G A G

    h a s h ( y 7 1 4 ] ) = 1 7 1 1 7

    N i n t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    x G C A G A G A G

    h a s h ( y 8 1 5 ] ) = 1 7 6 7 8

    T e n t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    x G C A G A G A G

    h a s h ( y 9 1 6 ] ) = 1 7 2 4 5

    E l e v e n t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    x G C A G A G A G

    h a s h ( y 1 0 1 7 ] ) = 1 7 9 1 7

    T w e l f t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    x G C A G A G A G

    h a s h ( y 1 1 1 8 ] ) = 1 7 7 2 3

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    35/220

    4 . 5 R e f e r e n c e s 3 5

    T h i r t e e n t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    x G C A G A G A G

    h a s h ( y 1 2 1 9 ] ) = 1 8 8 7 7

    F o u r t e e n t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    x G C A G A G A G

    h a s h ( y 1 3 2 0 ] ) = 1 9 6 6 2

    F i f t e e n t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    x G C A G A G A G

    h a s h ( y 1 4 2 1 ] ) = 1 7 8 8 5

    S i x t e e n t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    x G C A G A G A G

    h a s h ( y 1 5 2 2 ] ) = 1 9 1 9 7

    S e v e n t e e n t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    x G C A G A G A G

    h a s h ( y 1 6 2 3 ] ) = 1 6 9 6 1

    T h e K a r p - R a b i n a l g o r i t h m p e r f o r m s 1 7 c o m p a r i s o n s o n h a s h i n g v a l u e s

    a n d 8 c h a r a c t e r c o m p a r i s o n s o n t h e e x a m p l e .

    4 . 5 R e f e r e n c e s

    A h o , A . V . , 1 9 9 0 , A l g o r i t h m s f o r F i n d i n g P a t t e r n s i n S t r i n g s , i n

    H a n d b o o k o f T h e o r e t i c a l C o m p u t e r S c i e n c e , V o l u m e A , A l g o r i t h m s

    a n d c o m p l e x i t y , J . v a n L e e u w e n e d . , C h a p t e r 5 , p p 2 5 5 3 0 0 , E l s e -

    v i e r , A m s t e r d a m .

    C o r m e n , T . H . L e i s e r s o n , C . E . R i v e s t , R . L . , 1 9 9 0 , I n t r o d u c -

    t i o n t o A l g o r i t h m s , C h a p t e r 3 4 , p p 8 5 3 8 8 5 , M I T P r e s s .

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    36/220

    3 6 C h a p t e r 4 K a r p - R a b i n a l g o r i t h m

    C r o c h e m o r e , M . H a n c a r t , C . , 1 9 9 9 , P a t t e r n M a t c h i n g i n S t r -

    i n g s , i n A l g o r i t h m s a n d T h e o r y o f C o m p u t a t i o n H a n d b o o k , M . J .

    A t a l l a h e d . , C h a p t e r 1 1 , p p 1 1 - 1 1 1 - 2 8 , C R C P r e s s I n c . , B o c a R a -

    t o n , F L .

    G o n n e t , G . H . B a e z a - Y a t e s , R . A . , 1 9 9 1 , H a n d b o o k o f A l g o -

    r i t h m s a n d D a t a S t r u c t u r e s i n P a s c a l a n d C , 2 n d E d i t i o n , C h a p t e r

    7 , p p . 2 5 1 2 8 8 , A d d i s o n - W e s l e y P u b l i s h i n g C o m p a n y .

    H a n c a r t , C . , 1 9 9 3 , A n a l y s e e x a c t e e t e n m o y e n n e d ' a l g o r i t h m e s

    d e r e c h e r c h e d ' u n m o t i f d a n s u n t e x t e , T h s e d e d o c t o r a t d e l ' U n i -

    v e r s i t d e P a r i s 7 , F r a n c e .

    C r o c h e m o r e , M . L e c r o q , T . , 1 9 9 6 , P a t t e r n m a t c h i n g a n d t e x t

    c o m p r e s s i o n a l g o r i t h m s , i n C R C C o m p u t e r S c i e n c e a n d E n g i n e e r i n g

    H a n d b o o k , A . B . T u c k e r J r e d . , C h a p t e r 8 , p p 1 6 2 2 0 2 , C R C P r e s s

    I n c . , B o c a R a t o n , F L .

    K a r p , R . M . R a b i n , M . O . , 1 9 8 7 , E c i e n t r a n d o m i z e d p a t t e r n -

    m a t c h i n g a l g o r i t h m s , I B M J o u r n a l o n R e s e a r c h D e v e l o p m e n t 3 1 ( 2 ) :

    2 4 9 2 6 0 .

    S e d g e w i c k , R . , 1 9 8 8 , A l g o r i t h m s , C h a p t e r 1 9 , p p . 2 7 7 2 9 2 , A d d i -

    s o n - W e s l e y P u b l i s h i n g C o m p a n y .

    S e d g e w i c k , R . , 1 9 9 2 , A l g o r i t h m s i n C , C h a p t e r 1 9 , A d d i s o n -

    W e s l e y P u b l i s h i n g C o m p a n y .

    S t e p h e n , G . A . , 1 9 9 4 , S t r i n g S e a r c h i n g A l g o r i t h m s , W o r l d S c i e n -

    t i c .

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    37/220

    5 S h i f t O r a l g o r i t h m

    5 . 1 M a i n f e a t u r e s

    u s e s b i t w i s e t e c h n i q u e s

    e c i e n t i f t h e p a t t e r n l e n g t h i s n o l o n g e r t h a n t h e m e m o r y - w o r d

    s i z e o f t h e m a c h i n e

    p r e p r o c e s s i n g p h a s e i n O ( m + ) t i m e a n d s p a c e c o m p l e x i t y

    s e a r c h i n g p h a s e i n O ( n ) t i m e c o m p l e x i t y ( i n d e p e n d e n t f r o m t h e

    a l p h a b e t s i z e a n d t h e p a t t e r n l e n g t h )

    a d a p t s e a s i l y t o a p p r o x i m a t e s t r i n g - m a t c h i n g .

    5 . 2 D e s c r i p t i o n

    T h e S h i f t O r a l g o r i t h m u s e s b i t w i s e t e c h n i q u e s . L e t R b e a b i t a r r a y

    o f s i z e m . V e c t o r R

    j

    i s t h e v a l u e o f t h e a r r a y R a f t e r t e x t c h a r a c t e r

    y j h a s b e e n p r o c e s s e d ( s e e g u r e 5 . 1 ) . I t c o n t a i n s i n f o r m a t i o n s a b o u t

    a l l m a t c h e s o f p r e x e s o f x t h a t e n d a t p o s i t i o n j i n t h e t e x t . F o r

    0 i m ; 1

    R

    j

    i =

    n

    0 i f x 0 i = y j ; i : : j

    1 o t h e r w i s e .

    T h e v e c t o r R

    j + 1

    c a n b e c o m p u t e d a f t e r R

    j

    a s f o l l o w s . F o r e a c h R

    j

    i =

    0

    R

    j + 1

    i + 1 ] =

    0 i f x i + 1 ] = y j + 1

    1 o t h e r w i s e ,

    a n d

    R

    j + 1

    0 ] =

    n

    0 i f x 0 ] = y j + 1

    1 o t h e r w i s e .

    I f R

    j + 1

    m ; 1 ] = 0 t h e n a c o m p l e t e m a t c h c a n b e r e p o r t e d .

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    38/220

    3 8 C h a p t e r 5 S h i f t O r a l g o r i t h m

    y

    j

    x 0 i = 0 1

    x 0 1 i = 1 0

    x 0 2 i = 2 1

    x i = m ; 1 0

    R

    j

    F i g u r e 5 . 1 M e a n i n g o f v e c t o r R

    j

    i n t h e S h i f t - O r a l g o r i t h m .

    T h e t r a n s i t i o n f r o m R

    j

    t o R

    j + 1

    c a n b e c o m p u t e d v e r y f a s t a s f o l l o w s .

    F o r e a c h c 2 , l e t S

    c

    b e a b i t a r r a y o f s i z e m s u c h t h a t :

    f o r 0 i m ; 1 S

    c

    i = 0 i f a n d o n l y i f x i = c

    T h e a r r a y S

    c

    d e n o t e s t h e p o s i t i o n s o f t h e c h a r a c t e r c i n t h e p a t t e r n x

    E a c h S

    c

    c a n b e p r e p r o c e s s e d b e f o r e t h e s e a r c h . A n d t h e c o m p u t a t i o n o f

    R

    j + 1

    r e d u c e s t o t w o o p e r a t i o n s , s h i f t a n d o r :

    R

    j + 1

    = S h i f t ( R

    j

    ) O r S

    y j + 1

    A s s u m i n g t h a t t h e p a t t e r n l e n g t h i s n o l o n g e r t h a n t h e m e m o r y - w o r d

    s i z e o f t h e m a c h i n e , t h e s p a c e a n d t i m e c o m p l e x i t y o f t h e p r e p r o c e s s i n g

    p h a s e i s O ( m + ) . T h e t i m e c o m p l e x i t y o f t h e s e a r c h i n g p h a s e i s O ( n )

    t h u s i n d e p e n d e n t f r o m t h e a l p h a b e t s i z e a n d t h e p a t t e r n l e n g t h . T h e

    S h i f t O r a l g o r i t h m a d a p t s e a s i l y t o a p p r o x i m a t e s t r i n g - m a t c h i n g .

    5 . 3 T h e C c o d e

    i n t p r e S o ( c h a r * x , i n t m , u n s i g n e d i n t S ] ) {

    u n s i g n e d i n t j , l i m

    i n t i

    f o r ( i = 0 i < A S I Z E + + i )

    S i ] = ~ 0

    f o r ( l i m = i = 0 , j = 1 i < m + + i , j < < = 1 ) {

    S x i ] ] & = ~ j

    l i m | = j

    }

    l i m = ~ ( l i m > > 1 )

    r e t u r n ( l i m )

    }

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    39/220

    5 . 4 T h e e x a m p l e 3 9

    v o i d S O ( c h a r * x , i n t m , c h a r * y , i n t n ) {

    u n s i g n e d i n t l i m , s t a t e

    u n s i g n e d i n t S A S I Z E ]

    i n t j

    i f ( m > W O R D )

    e r r o r ( " S O : U s e p a t t e r n s i z e < = w o r d s i z e " )

    / * P r e p r o c e s s i n g * /

    l i m = p r e S o ( x , m , S )

    / * S e a r c h i n g * /

    f o r ( s t a t e = ~ 0 , j = 0 j < n + + j ) {

    s t a t e = ( s t a t e < < 1 ) | S y j ] ]

    i f ( s t a t e < l i m )

    O U T P U T ( j - m + 1 )

    }

    }

    5 . 4 T h e e x a m p l e

    S

    A

    S

    C

    S

    G

    S

    T

    G 1 1 0 1

    C 1 0 1 1

    A 0 1 1 1

    G 1 1 0 1

    A 0 1 1 1

    G 1 1 0 1

    A 0 1 1 1

    G 1 1 0 1

    0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 2 0 2 1 2 2 2 3

    G C A T C G C A G A G A G T A T A C A G T A C G

    0 G 0 1 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 1 1 0 1 1 1 0

    1 C 1 0 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

    2 A 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

    3 G 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

    4 A 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1

    5 G 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1

    6 A 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1

    7 G 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1

    A s R

    1 2

    7 ] = 0 i t m e a n s t h a t a n o c c u r r e n c e o f x h a s b e e n f o u n d a t

    p o s i t i o n 1 2 ; 8 + 1 = 5

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    40/220

    4 0 C h a p t e r 5 S h i f t O r a l g o r i t h m

    5 . 5 R e f e r e n c e s

    B a e z a - Y a t e s , R . A . G o n n e t , G . H . , 1 9 9 2 , A n e w a p p r o a c h t o

    t e x t s e a r c h i n g , C o m m u n i c a t i o n s o f t h e A C M 3 5 ( 1 0 ) : 7 4 8 2 .

    B a e z a - Y a t e s , R . A . N a v a r r o G . R i b e i r o - N e t o B . , 1 9 9 9 , I n -

    d e x i n g a n d S e a r c h i n g , i n M o d e r n I n f o r m a t i o n R e t r i e v a l , C h a p t e r 8 ,

    p p 1 9 1 2 2 8 , A d d i s o n - W e s l e y .

    C r o c h e m o r e , M . L e c r o q , T . , 1 9 9 6 , P a t t e r n m a t c h i n g a n d t e x t

    c o m p r e s s i o n a l g o r i t h m s , i n C R C C o m p u t e r S c i e n c e a n d E n g i n e e r i n g

    H a n d b o o k , A . B . T u c k e r J r e d . , C h a p t e r 8 , p p 1 6 2 2 0 2 , C R C P r e s s

    I n c . , B o c a R a t o n , F L .

    G o n n e t , G . H . B a e z a - Y a t e s , R . A . , 1 9 9 1 , H a n d b o o k o f A l g o -

    r i t h m s a n d D a t a S t r u c t u r e s i n P a s c a l a n d C , 2 n d E d i t i o n , C h a p t e r

    7 , p p . 2 5 1 2 8 8 , A d d i s o n - W e s l e y P u b l i s h i n g C o m p a n y .

    W u , S . M a n b e r , U . , 1 9 9 2 , F a s t t e x t s e a r c h i n g a l l o w i n g e r r o r s ,

    C o m m u n i c a t i o n s o f t h e A C M 3 5 ( 1 0 ) : 8 3 9 1 .

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    41/220

    6 M o r r i s - P r a t t a l g o r i t h m

    6 . 1 M a i n F e a t u r e s

    p e r f o r m s t h e c o m p a r i s o n s f r o m l e f t t o r i g h t

    p r e p r o c e s s i n g p h a s e i n O ( m ) s p a c e a n d t i m e c o m p l e x i t y

    s e a r c h i n g p h a s e i n O ( m + n ) t i m e c o m p l e x i t y ( i n d e p e n d e n t f r o m t h e

    a l p h a b e t s i z e )

    p e r f o r m s a t m o s t 2 n ; 1 t e x t c h a r a c t e r c o m p a r i s o n s d u r i n g t h e

    s e a r c h i n g p h a s e

    d e l a y b o u n d e d b y m

    6 . 2 D e s c r i p t i o n

    T h e d e s i g n o f t h e M o r r i s - P r a t t a l g o r i t h m f o l l o w s a t i g h t a n a l y s i s o f t h e

    b r u t e f o r c e a l g o r i t h m ( s e e c h a p t e r 2 ) , a n d e s p e c i a l l y o n t h e w a y t h i s

    l a t t e r w a s t e s t h e i n f o r m a t i o n g a t h e r e d d u r i n g t h e s c a n o f t h e t e x t .

    L e t u s l o o k m o r e c l o s e l y a t t h e b r u t e f o r c e a l g o r i t h m . I t i s p o s s i b l e

    t o i m p r o v e t h e l e n g t h o f t h e s h i f t s a n d s i m u l t a n e o u s l y r e m e m b e r s o m e

    p o r t i o n s o f t h e t e x t t h a t m a t c h t h e p a t t e r n . T h i s s a v e s c o m p a r i s o n s

    b e t w e e n c h a r a c t e r s o f t h e p a t t e r n a n d c h a r a c t e r s o f t h e t e x t a n d c o n s e -

    q u e n t l y i n c r e a s e s t h e s p e e d o f t h e s e a r c h .

    C o n s i d e r a n a t t e m p t a t a l e f t p o s i t i o n j o n y , t h a t i s w h e n t h e w i n d o w

    i s p o s i t i o n e d o n t h e t e x t f a c t o r y j : : j + m ; 1 . A s s u m e t h a t t h e r s t

    m i s m a t c h o c c u r s b e t w e e n x i a n d y i + j w i t h 0 < i < m . T h e n ,

    x 0 i ; 1 = y j : : i + j ; 1 = u a n d a = x i 6= y i + j = b . W h e n s h i f t i n g ,

    i t i s r e a s o n a b l e t o e x p e c t t h a t a p r e x v o f t h e p a t t e r n m a t c h e s s o m e

    s u x o f t h e p o r t i o n u o f t h e t e x t . T h e l o n g e s t s u c h p r e x v i s c a l l e d t h e

    b o r d e r o f u ( i t o c c u r s a t b o t h e n d s o f u ) . T h i s i n t r o d u c e s t h e n o t a t i o n :

    l e t m p N e x t i b e t h e l e n g t h o f t h e l o n g e s t b o r d e r o f x 0 i ; 1 f o r 0 < i

    m . T h e n , a f t e r a s h i f t , t h e c o m p a r i s o n s c a n r e s u m e b e t w e e n c h a r a c t e r s

    c = x m p N e x t i a n d y i + j = b w i t h o u t m i s s i n g a n y o c c u r r e n c e o f x

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    42/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    43/220

    6 . 4 T h e e x a m p l e 4 3

    j + +

    i f ( i > = m ) {

    O U T P U T ( j - i )

    i = m p N e x t i ]

    }

    }

    }

    6 . 4 T h e e x a m p l e

    i 0 1 2 3 4 5 6 7 8

    x i G C A G A G A G

    m p N e x t i ; 1 0 0 0 1 0 1 0 1

    S e a r c h i n g p h a s e

    F i r s t a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1 2 3 4

    x G C A G A G A G

    S h i f t b y 3 ( i ; m p N e x t i = 3 ; 0 )

    S e c o n d a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1

    x G C A G A G A G

    S h i f t b y 1 ( i ; m p N e x t i = 0 ; ; 1 )

    T h i r d a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1

    x G C A G A G A G

    S h i f t b y 1 ( i ; m p N e x t i = 0 ; ; 1 )

    F o u r t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1 2 3 4 5 6 7 8

    x G C A G A G A G

    S h i f t b y 7 ( i ; m p N e x t i = 8 ; 1 )

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    44/220

    4 4 C h a p t e r 6 M o r r i s - P r a t t a l g o r i t h m

    F i f t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1

    x G C A G A G A G

    S h i f t b y 1 ( i ; m p N e x t i = 1 ; 0 )

    S i x t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1

    x G C A G A G A G

    S h i f t b y 1 ( i ; m p N e x t i = 0 ; ; 1 )

    S e v e n t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1

    x G C A G A G A G

    S h i f t b y 1 ( i ; m p N e x t i = 0 ; ; 1 )

    E i g h t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1

    x G C A G A G A G

    S h i f t b y 1 ( i ; m p N e x t i = 0 ; ; 1 )

    N i n t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1

    x G C A G A G A G

    S h i f t b y 1 ( i ; m p N e x t i = 0 ; ; 1 )

    T h e M o r r i s - P r a t t a l g o r i t h m p e r f o r m s 1 9 t e x t c h a r a c t e r c o m p a r i s o n s

    o n t h e e x a m p l e .

    6 . 5 R e f e r e n c e s

    A h o , A . V . H o p c r o f t , J . E . U l l m a n , J . D . , 1 9 7 4 , T h e d e s i g n

    a n d a n a l y s i s o f c o m p u t e r a l g o r i t h m s , 2 n d E d i t i o n , C h a p t e r 9 , p p .

    3 1 7 3 6 1 , A d d i s o n - W e s l e y P u b l i s h i n g C o m p a n y .

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    45/220

    6 . 5 R e f e r e n c e s 4 5

    B e a u q u i e r , D . B e r s t e l , J . C h r t i e n n e , P . , 1 9 9 2 , l m e n t s

    d ' a l g o r i t h m i q u e , C h a p t e r 1 0 , p p 3 3 7 3 7 7 , M a s s o n , P a r i s .

    C r o c h e m o r e , M . , 1 9 9 7 , O - l i n e s e r i a l e x a c t s t r i n g s e a r c h i n g , i n

    P a t t e r n M a t c h i n g A l g o r i t h m s , A . A p o s t o l i c o a n d Z . G a l i l e d . , C h a p -

    t e r 1 , p p 1 5 3 , O x f o r d U n i v e r s i t y P r e s s .

    H a n c a r t , C . , 1 9 9 2 , U n e a n a l y s e e n m o y e n n e d e l ' a l g o r i t h m e d e

    M o r r i s e t P r a t t e t d e s e s r a n e m e n t s , i n T h o r i e d e s A u t o m a t e s

    e t A p p l i c a t i o n s , A c t e s d e s 2

    e

    J o u r n e s F r a n c o - B e l g e s , D . K r o b e d . ,

    R o u e n , F r a n c e , p p 9 9 1 1 0 , P U R 1 7 6 , R o u e n , F r a n c e .

    H a n c a r t , C . , 1 9 9 3 , A n a l y s e e x a c t e e t e n m o y e n n e d ' a l g o r i t h m e s

    d e r e c h e r c h e d ' u n m o t i f d a n s u n t e x t e , T h s e d e d o c t o r a t d e l ' U n i -

    v e r s i t d e P a r i s 7 , F r a n c e .

    M o r r i s , J r , J . H . P r a t t , V . R . , 1 9 7 0 , A l i n e a r p a t t e r n - m a t c h i n g

    a l g o r i t h m , T e c h n i c a l R e p o r t 4 0 , U n i v e r s i t y o f C a l i f o r n i a , B e r k e l e y .

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    46/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    47/220

    7 K n u t h - M o r r i s - P r a t t a l g o r i t h m

    7 . 1 M a i n F e a t u r e s

    p e r f o r m s t h e c o m p a r i s o n s f r o m l e f t t o r i g h t

    p r e p r o c e s s i n g p h a s e i n O ( m ) s p a c e a n d t i m e c o m p l e x i t y

    s e a r c h i n g p h a s e i n O ( m + n ) t i m e c o m p l e x i t y ( i n d e p e n d e n t f r o m t h e

    a l p h a b e t s i z e )

    p e r f o r m s a t m o s t 2 n ; 1 t e x t c h a r a c t e r c o m p a r i s o n s d u r i n g t h e

    s e a r c h i n g p h a s e

    d e l a y b o u n d e d b y l o g

    ( m ) w h e r e i s t h e g o l d e n r a t i o : =

    1 +

    p

    5

    2

    7 . 2 D e s c r i p t i o n

    T h e d e s i g n o f t h e K n u t h - M o r r i s - P r a t t a l g o r i t h m f o l l o w s a t i g h t a n a l y s i s

    o f t h e M o r r i s - P r a t t a l g o r i t h m ( s e e c h a p t e r 6 ) . L e t u s l o o k m o r e c l o s e l y

    a t t h e M o r r i s - P r a t t a l g o r i t h m . I t i s p o s s i b l e t o i m p r o v e t h e l e n g t h o f

    t h e s h i f t s .

    C o n s i d e r a n a t t e m p t a t a l e f t p o s i t i o n j , t h a t i s w h e n t h e t h e w i n d o w

    i s p o s i t i o n e d o n t h e t e x t f a c t o r y j : : j + m ; 1 . A s s u m e t h a t t h e r s t

    m i s m a t c h o c c u r s b e t w e e n x i a n d y i + j w i t h 0 < i < m . T h e n ,

    x 0 i ; 1 = y j : : i + j ; 1 = u a n d a = x i 6= y i + j = b . W h e n s h i f t i n g ,

    i t i s r e a s o n a b l e t o e x p e c t t h a t a p r e x v o f t h e p a t t e r n m a t c h e s s o m e

    s u x o f t h e p o r t i o n u o f t h e t e x t . M o r e o v e r , i f w e w a n t t o a v o i d a n o t h e r

    i m m e d i a t e m i s m a t c h , t h e c h a r a c t e r f o l l o w i n g t h e p r e x v i n t h e p a t t e r n

    m u s t b e d i e r e n t f r o m a . T h e l o n g e s t s u c h p r e x v i s c a l l e d t h e t a g g e d

    b o r d e r o f u ( i t o c c u r s a t b o t h e n d s o f u f o l l o w e d b y d i e r e n t c h a r a c t e r s

    i n x ) . T h i s i n t r o d u c e s t h e n o t a t i o n : l e t k m p N e x t i b e t h e l e n g t h o f

    t h e l o n g e s t b o r d e r o f x 0 i ; 1 f o l l o w e d b y a c h a r a c t e r c d i e r e n t f r o m

    x i a n d ; 1 i f n o s u c h t a g g e d b o r d e r e x i t s , f o r 0 < i m . T h e n , a f t e r

    a s h i f t , t h e c o m p a r i s o n s c a n r e s u m e b e t w e e n c h a r a c t e r s x k m p N e x t i

    a n d y i + j w i t h o u t m i s s i n g a n y o c c u r r e n c e o f x i n y , a n d a v o i d i n g a

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    48/220

    4 8 C h a p t e r 7 K n u t h - M o r r i s - P r a t t a l g o r i t h m

    y

    u b

    j i + j

    x

    u a

    x

    v c

    F i g u r e 7 . 1 S h i f t i n t h e K n u t h - M o r r i s - P r a t t a l g o r i t h m : v i s a b o r d e r o f u

    a n d a 6= c

    b a c k t r a c k o n t h e t e x t ( s e e g u r e 7 . 1 ) . T h e v a l u e o f k m p N e x t 0 i s s e t

    t o ; 1 . T h e t a b l e k m p N e x t c a n b e c o m p u t e d i n O ( m ) s p a c e a n d t i m e

    b e f o r e t h e s e a r c h i n g p h a s e , a p p l y i n g t h e s a m e s e a r c h i n g a l g o r i t h m t o t h e

    p a t t e r n i t s e l f , a s i f x = y

    T h e s e a r c h i n g p h a s e c a n b e p e r f o r m e d i n O ( m + n ) t i m e . T h e K n u t h -

    M o r r i s - P r a t t a l g o r i t h m p e r f o r m s a t m o s t 2 n ; 1 t e x t c h a r a c t e r c o m -

    p a r i s o n s d u r i n g t h e s e a r c h i n g p h a s e . T h e d e l a y ( m a x i m u m n u m b e r o f

    c o m p a r i s o n s f o r a s i n g l e t e x t c h a r a c t e r ) i s b o u n d e d b y l o g

    ( m ) w h e r e

    i s t h e g o l d e n r a t i o ( =

    1 +

    p

    5

    2

    )

    7 . 3 T h e C c o d e

    v o i d p r e K m p ( c h a r * x , i n t m , i n t k m p N e x t ] ) {

    i n t i , j

    i = 0

    j = k m p N e x t 0 ] = - 1

    w h i l e ( i < m ) {

    w h i l e ( j > - 1 & & x i ] ! = x j ] )

    j = k m p N e x t j ]

    i + +

    j + +

    i f ( x i ] = = x j ] )

    k m p N e x t i ] = k m p N e x t j ]

    e l s e

    k m p N e x t i ] = j

    }

    }

    v o i d K M P ( c h a r * x , i n t m , c h a r * y , i n t n ) {

    i n t i , j , k m p N e x t X S I Z E ]

    / * P r e p r o c e s s i n g * /

    p r e K m p ( x , m , k m p N e x t )

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    49/220

    7 . 4 T h e e x a m p l e 4 9

    / * S e a r c h i n g * /

    i = j = 0

    w h i l e ( j < n ) {

    w h i l e ( i > - 1 & & x i ] ! = y j ] )

    i = k m p N e x t i ]

    i + +

    j + +

    i f ( i > = m ) {

    O U T P U T ( j - i )

    i = k m p N e x t i ]

    }

    }

    }

    7 . 4 T h e e x a m p l e

    i 0 1 2 3 4 5 6 7 8

    x i G C A G A G A G

    k m p N e x t i ; 1 0 0 ; 1 1 ; 1 1 ; 1 1

    S e a r c h i n g p h a s e

    F i r s t a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1 2 3 4

    x G C A G A G A G

    S h i f t b y 4 ( i ; k m p N e x t i = 3 ; ; 1 )

    S e c o n d a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1

    x G C A G A G A G

    S h i f t b y 1 ( i ; k m p N e x t i = 0 ; ; 1 )

    T h i r d a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1 2 3 4 5 6 7 8

    x G C A G A G A G

    S h i f t b y 7 ( i ; k m p N e x t i = 8 ; 1 )

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    50/220

    5 0 C h a p t e r 7 K n u t h - M o r r i s - P r a t t a l g o r i t h m

    F o u r t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    2

    x G C A G A G A G

    S h i f t b y 1 ( i ; k m p N e x t i = 1 ; 0 )

    F i f t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1

    x G C A G A G A G

    S h i f t b y 1 ( i ; k m p N e x t i = 0 ; ; 1 )

    S i x t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1

    x G C A G A G A G

    S h i f t b y 1 ( i ; k m p N e x t i = 0 ; ; 1 )

    S e v e n t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1

    x G C A G A G A G

    S h i f t b y 1 ( i ; k m p N e x t i = 0 ; ; 1 )

    E i g h t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1

    x G C A G A G A G

    S h i f t b y 1 ( i ; k m p N e x t i = 0 ; ; 1 )

    T h e K n u t h - M o r r i s - P r a t t a l g o r i t h m p e r f o r m s 1 8 t e x t c h a r a c t e r c o m -

    p a r i s o n s o n t h e e x a m p l e .

    7 . 5 R e f e r e n c e s

    A h o , A . V . , 1 9 9 0 , A l g o r i t h m s f o r F i n d i n g P a t t e r n s i n S t r i n g s , i n

    H a n d b o o k o f T h e o r e t i c a l C o m p u t e r S c i e n c e , V o l u m e A , A l g o r i t h m s

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    51/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    52/220

    5 2 C h a p t e r 7 K n u t h - M o r r i s - P r a t t a l g o r i t h m

    K n u t h , D . E . M o r r i s , J r , J . H . P r a t t , V . R . , 1 9 7 7 , F a s t p a t -

    t e r n m a t c h i n g i n s t r i n g s , S I A M J o u r n a l o n C o m p u t i n g 6 ( 1 ) : 3 2 3

    3 5 0 .

    S e d g e w i c k , R . , 1 9 8 8 , A l g o r i t h m s , C h a p t e r 1 9 , p p . 2 7 7 2 9 2 , A d d i -

    s o n - W e s l e y P u b l i s h i n g C o m p a n y .

    S e d g e w i c k , R . , 1 9 9 2 , A l g o r i t h m s i n C , C h a p t e r 1 9 , A d d i s o n -

    W e s l e y P u b l i s h i n g C o m p a n y .

    S e d g e w i c k , R . F l a j o l e t , P . , 1 9 9 6 , A n I n t r o d u c t i o n t o t h e A n a l -

    y s i s o f A l g o r i t h m s , C h a p t e r 7 , A d d i s o n - W e s l e y P u b l i s h i n g C o m -

    p a n y .

    S t e p h e n , G . A . , 1 9 9 4 , S t r i n g S e a r c h i n g A l g o r i t h m s , W o r l d S c i e n -

    t i c .

    W a t s o n , B . W . , 1 9 9 5 , T a x o n o m i e s a n d T o o l k i t s o f R e g u l a r L a n -

    g u a g e A l g o r i t h m s , P h D T h e s i s , E i n d h o v e n U n i v e r s i t y o f T e c h n o l -

    o g y , T h e N e t h e r l a n d s .

    W i r t h , N . , 1 9 8 6 , A l g o r i t h m s & D a t a S t r u c t u r e s , C h a p t e r 1 , p p .

    1 7 7 2 , P r e n t i c e - H a l l .

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    53/220

    8 S i m o n a l g o r i t h m

    8 . 1 M a i n f e a t u r e s

    e c o n o m i c a l i m p l e m e n t a t i o n o f A ( x ) t h e m i n i m a l D e t e r m i n i s t i c F i -

    n i t e A u t o m a t o n r e c o g n i z i n g

    x

    p r e p r o c e s s i n g p h a s e i n O ( m ) t i m e a n d s p a c e c o m p l e x i t y

    s e a r c h i n g p h a s e i n O ( m + n ) t i m e c o m p l e x i t y ( i n d e p e n d e n t f r o m t h e

    a l p h a b e t s i z e )

    a t m o s t 2 n ; 1 t e x t c h a r a c t e r c o m p a r i s o n s d u r i n g t h e s e a r c h i n g

    p h a s e

    d e l a y b o u n d e d b y m i n f 1 + l o g

    2

    m g

    8 . 2 D e s c r i p t i o n

    T h e m a i n d r a w b a c k o f t h e s e a r c h w i t h t h e m i n i m a l D F A A ( x ) ( s e e c h a p -

    t e r 3 ) i s t h e s i z e o f t h e a u t o m a t o n : O ( m ) . S i m o n n o t i c e d t h a t t h e r e

    a r e o n l y a f e w s i g n i c a n t e d g e s i n A ( x ) t h e y a r e :

    t h e f o r w a r d e d g e s g o i n g f r o m t h e p r e x o f x o f l e n g t h k t o t h e p r e x

    o f l e n g t h k + 1 f o r 0 k < m . T h e r e a r e e x a c t l y m s u c h e d g e s

    t h e b a c k w a r d e d g e s g o i n g f r o m t h e p r e x o f x o f l e n g t h k t o a s m a l l e r

    n o n - z e r o l e n g t h p r e x . T h e n u m b e r o f s u c h e d g e s i s b o u n d e d b y m

    T h e o t h e r e d g e s a r e l e a d i n g t o t h e i n i t i a l s t a t e a n d c a n t h e n b e d e -

    d u c e d . T h u s t h e n u m b e r o f s i g n i c a n t e d g e s i s b o u n d e d b y 2 m . T h e n

    f o r e a c h s t a t e o f t h e a u t o m a t o n i t i s o n l y n e c e s s a r y t o s t o r e t h e l i s t o f

    i t s s i g n i c a n t o u t g o i n g e d g e s .

    E a c h s t a t e i s r e p r e s e n t e d b y t h e l e n g t h o f i t s a s s o c i a t e d p r e x m i n u s

    1 i n o r d e r t h a t e a c h e d g e l e a d i n g t o s t a t e i , w i t h ; 1 i m ; 1 i s

    l a b e l l e d b y x i t h u s i t i s n o t n e c e s s a r y t o s t o r e t h e l a b e l s o f t h e e d g e s .

    T h e f o r w a r d e d g e s c a n b e e a s i l y d e d u c e d f r o m t h e p a t t e r n , t h u s t h e y a r e

    n o t s t o r e d . I t o n l y r e m a i n s t o s t o r e t h e s i g n i c a n t b a c k w a r d e d g e s .

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    54/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    55/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    56/220

    5 6 C h a p t e r 8 S i m o n a l g o r i t h m

    s t a t e = e l l

    }

    }

    }

    8 . 4 T h e e x a m p l e

    - 1 0 1 2 3 4 5 6 7

    G C A G A G A G

    G

    G

    G

    G

    C

    C

    T h e s t a t e s a r e l a b e l l e d b y t h e l e n g t h o f t h e p r e x t h e y a r e a s s o c i a t e d

    w i t h m i n u s 1 .

    i 0 1 2 3 4 5 6

    L i ] ( 0 ) ( 0 ) ( 0 1 ) ( 0 1 )

    S e a r c h i n g p h a s e

    T h e i n i t i a l s t a t e i s ; 1

    y G C A T C G C A G A G A G T A T A C A G T A C G

    0

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1

    y G C A T C G C A G A G A G T A T A C A G T A C G

    2

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    57/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    58/220

    5 8 C h a p t e r 8 S i m o n a l g o r i t h m

    y G C A T C G C A G A G A G T A T A C A G T A C G

    ; 1

    y G C A T C G C A G A G A G T A T A C A G T A C G

    ; 1

    y G C A T C G C A G A G A G T A T A C A G T A C G

    ; 1

    y G C A T C G C A G A G A G T A T A C A G T A C G

    ; 1

    y G C A T C G C A G A G A G T A T A C A G T A C G

    ; 1

    y G C A T C G C A G A G A G T A T A C A G T A C G

    ; 1

    y G C A T C G C A G A G A G T A T A C A G T A C G

    0

    y G C A T C G C A G A G A G T A T A C A G T A C G

    ; 1

    y G C A T C G C A G A G A G T A T A C A G T A C G

    ; 1

    y G C A T C G C A G A G A G T A T A C A G T A C G

    ; 1

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    59/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    60/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    61/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    62/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    63/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    64/220

    6 4 C h a p t e r 9 C o l u s s i a l g o r i t h m

    i = k

    } w h i l e ( k < = m )

    / * C o m p u t a t i o n o f k m i n * /

    m e m s e t ( k m i n , 0 , m * s i z e o f ( i n t ) )

    f o r ( i = m i > = 1 - - i )

    i f ( h m a x i ] < m )

    k m i n h m a x i ] ] = i

    / * C o m p u t a t i o n o f r m i n * /

    f o r ( i = m - 1 i > = 0 - - i ) {

    i f ( h m a x i + 1 ] = = m )

    r = i + 1

    i f ( k m i n i ] = = 0 )

    r m i n i ] = r

    e l s e

    r m i n i ] = 0

    }

    / * C o m p u t a t i o n o f h * /

    s = - 1

    r = m

    f o r ( i = 0 i < m + + i )

    i f ( k m i n i ] = = 0 )

    h - - r ] = i

    e l s e

    h + + s ] = i

    n d = s

    / * C o m p u t a t i o n o f s h i f t * /

    f o r ( i = 0 i < = n d + + i )

    s h i f t i ] = k m i n h i ] ]

    f o r ( i = n d + 1 i < m + + i )

    s h i f t i ] = r m i n h i ] ]

    s h i f t m ] = r m i n 0 ]

    / * C o m p u t a t i o n o f n h d 0 * /

    s = 0

    f o r ( i = 0 i < m + + i ) {

    n h d 0 i ] = s

    i f ( k m i n i ] > 0 )

    + + s

    }

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    65/220

    9 . 3 T h e C c o d e 6 5

    / * C o m p u t a t i o n o f n e x t * /

    f o r ( i = 0 i < = n d + + i )

    n e x t i ] = n h d 0 h i ] - k m i n h i ] ] ]

    f o r ( i = n d + 1 i < m + + i )

    n e x t i ] = n h d 0 m - r m i n h i ] ] ]

    n e x t m ] = n h d 0 m - r m i n h m - 1 ] ] ]

    r e t u r n ( n d )

    }

    v o i d C O L U S S I ( c h a r * x , i n t m , c h a r * y , i n t n ) {

    i n t i , j , l a s t , n d ,

    h X S I Z E ] , n e x t X S I Z E ] , s h i f t X S I Z E ]

    / * P r o c e s s i n g * /

    n d = p r e C o l u s s i ( x , m , h , n e x t , s h i f t )

    / * S e a r c h i n g * /

    i = j = 0

    l a s t = - 1

    w h i l e ( j < = n - m ) {

    w h i l e ( i < m & & l a s t < j + h i ] & &

    x h i ] ] = = y j + h i ] ] )

    i + +

    i f ( i > = m | | l a s t > = j + h i ] ) {

    O U T P U T ( j )

    i = m

    }

    i f ( i > n d )

    l a s t = j + m - 1

    j + = s h i f t i ]

    i = n e x t i ]

    }

    }

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    66/220

    6 6 C h a p t e r 9 C o l u s s i a l g o r i t h m

    9 . 4 T h e e x a m p l e

    i 0 1 2 3 4 5 6 7 8

    x i G C A G A G A G

    k m p N e x t i - 1 0 0 - 1 1 - 1 1 - 1 1

    k m i n i 0 1 2 0 3 0 5 0

    h i 1 2 4 6 7 5 3 0

    n e x t i 0 0 0 0 0 0 0 0 0

    s h i f t i 1 2 3 5 8 7 7 7 7

    h m a x i 0 1 2 4 4 6 6 8 8

    r m i n i 7 0 0 7 0 7 0 8

    n d h 0 i 0 0 1 2 2 3 3 4

    n d = 3

    S e a r c h i n g p h a s e

    F i r s t a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1 2 3

    x G C A G A G A G

    S h i f t b y 3 ( s h i f t 2 )

    S e c o n d a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1 2

    x G C A G A G A G

    S h i f t b y 2 ( s h i f t 1 )

    T h i r d a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    8 1 2 7 3 6 4 5

    x G C A G A G A G

    S h i f t b y 7 ( s h i f t 8 )

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    67/220

    9 . 5 R e f e r e n c e s 6 7

    F o u r t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1

    x G C A G A G A G

    S h i f t b y 1 ( s h i f t 0 )

    F i f t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1

    x G C A G A G A G

    S h i f t b y 1 ( s h i f t 0 )

    S i x t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1

    x G C A G A G A G

    S h i f t b y 1 ( s h i f t 0 )

    S e v e n t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1

    x G C A G A G A G

    S h i f t b y 1 ( s h i f t 0 )

    E i g h t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1 2 3

    x G C A G A G A G

    S h i f t b y 3 ( s h i f t 2 )

    T h e C o l u s s i a l g o r i t h m p e r f o r m s 2 0 t e x t c h a r a c t e r c o m p a r i s o n s o n t h e

    e x a m p l e .

    9 . 5 R e f e r e n c e s

    B r e s l a u e r , D . , 1 9 9 2 , E c i e n t S t r i n g A l g o r i t h m i c s , P h D T h e s i s ,

    R e p o r t C U 0 2 4 9 2 , C o m p u t e r S c i e n c e D e p a r t m e n t , C o l u m b i a U n i -

    v e r s i t y , N e w Y o r k , N Y .

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    68/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    69/220

    1 0 G a l i l - G i a n c a r l o a l g o r i t h m

    1 0 . 1 M a i n f e a t u r e s

    r e n e m e n t o f C o l u s s i a l g o r i t h m

    p r e p r o c e s s i n g p h a s e i n O ( m ) t i m e a n d s p a c e c o m p l e x i t y

    s e a r c h i n g p h a s e i n O ( n ) t i m e c o m p l e x i t y

    p e r f o r m s

    4

    3

    n t e x t c h a r a c t e r c o m p a r i s o n s i n t h e w o r s t c a s e .

    1 0 . 2 D e s c r i p t i o n

    T h e G a l i l - G i a n c a r l o a l g o r i t h m i s a v a r i a n t o f t h e C o l u s s i a l g o r i t h m ( s e e

    c h a p t e r 9 ) . T h e c h a n g e i n t e r v e n e s i n t h e s e a r c h i n g p h a s e . T h e m e t h o d

    a p p l i e s w h e n x i s n o t a p o w e r o f a s i n g l e c h a r a c t e r . T h u s x 6= c

    m

    w i t h

    c 2 . L e t b e t h e l a s t i n d e x i n t h e p a t t e r n s u c h t h a t f o r 0 i

    x 0 ] = x i a n d x 0 6= x + 1 . A s s u m e t h a t d u r i n g t h e p r e v i o u s

    a t t e m p t a l l t h e n o h o l e s h a v e b e e n m a t c h e d a n d a s u x o f t h e p a t t e r n

    h a s b e e n m a t c h e d m e a n i n g t h a t a f t e r t h e c o r r e s p o n d i n g s h i f t a p r e x

    o f t h e p a t t e r n w i l l s t i l l m a t c h a p a r t o f t h e t e x t . T h u s t h e w i n d o w i s

    p o s i t i o n e d o n t h e t e x t f a c t o r y j : : j + m ; 1 a n d t h e p o r t i o n y j l a s t

    m a t c h e s x 0 l a s t ; j . T h e n d u r i n g t h e n e x t a t t e m p t t h e a l g o r i t h m w i l l

    s c a n n e d t h e t e x t c h a r a c t e r b e g i n n i n g w i t h y l a s t + 1 u n t i l e i t h e r t h e e n d

    o f t h e t e x t i s r e a c h e d o r a c h a r a c t e r x 0 6= y j + k i s f o u n d . I n t h i s

    l a t t e r c a s e t w o s u b c a s e s c a n a r i s e :

    x + 1 6= y j + k o r t o o l e s s x 0 h a v e b e e n f o u n d ( k ) t h e n t h e

    w i n d o w i s s h i f t e d a n d p o s i t i o n e d o n t h e t e x t f a c t o r y k + 1 k + m

    t h e s c a n n i n g o f t h e t e x t i s r e s u m e d ( a s i n t h e C o l u s s i a l g o r i t h m )

    w i t h t h e r s t n o h o l e a n d t h e m e m o r i z e d p r e x o f t h e p a t t e r n i s t h e

    e m p t y w o r d .

    x + 1 ] = y j + k a n d e n o u g h o f x 0 h a s b e e n f o u n d ( k > ) t h e n

    t h e w i n d o w i s s h i f t e d a n d p o s i t i o n e d o n t h e t e x t f a c t o r y k ; ;

    1 k ; + m ; 2 , t h e s c a n n i n g o f t h e t e x t i s r e s u m e d ( a s i n t h e

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    70/220

    7 0 C h a p t e r 1 0 G a l i l - G i a n c a r l o a l g o r i t h m

    C o l u s s i a l g o r i t h m ) w i t h t h e s e c o n d n o h o l e ( x + 1 i s t h e r s t o n e )

    a n d t h e m e m o r i z e d p r e x o f t h e p a t t e r n i s x 0 + 1

    T h e p r e p r o c e s s i n g p h a s e i s e x a c t l y t h e s a m e a s i n t h e C o l u s s i a l g o r i t h m

    ( c h a p t e r 9 ) a n d c a n b e d o n e i n O ( m ) s p a c e a n d t i m e . T h e s e a r c h i n g

    p h a s e c a n t h e n b e d o n e i n O ( n ) t i m e c o m p l e x i t y a n d f u r t h e r m o r e a t

    m o s t

    4

    3

    n t e x t c h a r a c t e r c o m p a r i s o n s a r e p e r f o r m e d d u r i n g t h e s e a r c h i n g

    p h a s e .

    1 0 . 3 T h e C c o d e

    T h e f u n c t i o n p r e C o l u s s i i s g i v e n c h a p t e r 9 .

    v o i d G G ( c h a r * x , i n t m , c h a r * y , i n t n ) {

    i n t i , j , k , e l l , l a s t , n d

    i n t h X S I Z E ] , n e x t X S I Z E ] , s h i f t X S I Z E ]

    c h a r h e a v y

    f o r ( e l l = 0 x e l l ] = = x e l l + 1 ] e l l + + )

    i f ( e l l = = m - 1 )

    / * S e a r c h i n g f o r a p o w e r o f a s i n g l e c h a r a c t e r * /

    f o r ( j = e l l = 0 j < n + + j )

    i f ( x 0 ] = = y j ] ) {

    + + e l l

    i f ( e l l > = m )

    O U T P U T ( j - m + 1 )

    }

    e l s e

    e l l = 0

    e l s e {

    / * P r e p r o c e s s i n g * /

    n d = p r e C O L U S S I ( x , m , h , n e x t , s h i f t )

    / * S e a r c h i n g * /

    i = j = h e a v y = 0

    l a s t = - 1

    w h i l e ( j < = n - m ) {

    i f ( h e a v y & & i = = 0 ) {

    k = l a s t - j + 1

    w h i l e ( x 0 ] = = y j + k ] )

    k + +

    i f ( k < = e l l | | x e l l + 1 ] ! = y j + k ] ) {

    i = 0

    j + = ( k + 1 )

    l a s t = j - 1

    }

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    71/220

    1 0 . 4 T h e e x a m p l e 7 1

    e l s e {

    i = 1

    l a s t = j + k

    j = l a s t - ( e l l + 1 )

    }

    h e a v y = 0

    }

    e l s e {

    w h i l e ( i < m & & l a s t < j + h i ] & &

    x h i ] ] = = y j + h i ] ] )

    + + i

    i f ( i > = m | | l a s t > = j + h i ] ) {

    O U T P U T ( j )

    i = m

    }

    i f ( i > n d )

    l a s t = j + m - 1

    j + = s h i f t i ]

    i = n e x t i ]

    }

    h e a v y = ( j > l a s t ? 0 : 1 )

    }

    }

    }

    1 0 . 4 T h e e x a m p l e

    i 0 1 2 3 4 5 6 7 8

    x i G C A G A G A G

    k m p N e x t i - 1 0 0 - 1 1 - 1 1 - 1 1

    k m i n i 0 1 2 0 3 0 5 0

    h i 1 2 4 6 7 5 3 0

    n e x t i 0 0 0 0 0 0 0 0 0

    s h i f t i 1 2 3 5 8 7 7 7 7

    h m a x i 0 1 2 4 4 6 6 8 8

    r m i n i 7 0 0 7 0 7 0 8

    n d h 0 i 0 0 1 2 2 3 3 4

    n d = 3 a n d = 0

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    72/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    73/220

    1 0 . 5 R e f e r e n c e s 7 3

    S e v e n t h a t t e m p t :

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1 2 3

    x G C A G A G A G

    S h i f t b y 3 ( s h i f t 2 )

    T h e G a l i l - G i a n c a r l o a l g o r i t h m p e r f o r m s 1 9 t e x t c h a r a c t e r c o m p a r i s o n s

    o n t h e e x a m p l e .

    1 0 . 5 R e f e r e n c e s

    B r e s l a u e r , D . , 1 9 9 2 , E c i e n t S t r i n g A l g o r i t h m i c s , P h D T h e s i s ,

    R e p o r t C U 0 2 4 9 2 , C o m p u t e r S c i e n c e D e p a r t m e n t , C o l u m b i a U n i -

    v e r s i t y , N e w Y o r k , N Y .

    G a l i l , Z . G i a n c a r l o , R . , 1 9 9 2 , O n t h e e x a c t c o m p l e x i t y o f s t r i n g

    m a t c h i n g : u p p e r b o u n d s , S I A M J o u r n a l o n C o m p u t i n g 2 1 ( 3 ) : 4 0 7

    4 3 7 .

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    74/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    75/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    76/220

    7 6 C h a p t e r 1 1 A p o s t o l i c o - C r o c h e m o r e a l g o r i t h m

    W e n o w e x p l a i n h o w t o c o m p u t e t h e n e x t t r i p l e a f t e r ( i j k ) h a s b e e n

    c o m p u t e d . T h r e e c a s e s a r i s e d e p e n d i n g o n t h e v a l u e o f i

    i =

    I f x i = y i + j t h e n t h e n e x t t r i p l e i s ( i + 1 j k )

    I f x i 6= y i + j t h e n t h e n e x t t r i p l e i s ( ` j + 1 m a x f 0 k ; 1 g )

    ` < i < m

    I f x i = y i + j t h e n t h e n e x t t r i p l e i s ( i + 1 j k )

    I f x i 6= y i + j t h e n t w o c a s e s a r i s e d e p e n d i n g o n t h e v a l u e o f

    k m p N e x t i

    k m p N e x t i : t h e n t h e n e x t t r i p l e i s

    ( ` i + j ; k m p N e x t i m a x f 0 k m p N e x t i g )

    k m p N e x t i > : t h e n t h e n e x t t r i p l e i s

    ( k m p N e x t i i + j ; k m p N e x t i )

    i = m

    I f k < a n d x k = y j + k t h e n t h e n e x t t r i p l e i s ( i j k + 1 )

    O t h e r w i s e e i t h e r k < a n d x k 6= y j + k o r k = I f k =

    a n o c c u r r e n c e o f x i s r e p o r t e d . I n b o t h c a s e s t h e n e x t t r i p l e i s

    c o m p u t e d i n t h e s a m e m a n n e r a s i n t h e c a s e w h e r e ` < i < m

    T h e p r e p r o c e s s i n g p h a s e c o n s i s t s i n c o m p u t i n g t h e t a b l e k m p N e x t

    a n d t h e i n t e g e r . I t c a n b e d o n e i n O ( m ) s p a c e a n d t i m e . T h e s e a r c h -

    i n g p h a s e i s i n O ( n ) t i m e c o m p l e x i t y a n d f u r t h e r m o r e t h e A p o s t o l i c o -

    C r o c h e m o r e a l g o r i t h m p e r f o r m s a t m o s t

    3

    2

    n t e x t c h a r a c t e r c o m p a r i s o n s

    i n t h e w o r s t c a s e .

    1 1 . 3 T h e C c o d e

    T h e f u n c t i o n p r e K m p i s g i v e n c h a p t e r 7 .

    v o i d A X A M A C ( c h a r * x , i n t m , c h a r * y , i n t n ) {

    i n t i , j , k , e l l , k m p N e x t X S I Z E ]

    / * P r e p r o c e s s i n g * /

    p r e K m p ( x , m , k m p N e x t )

    f o r ( e l l = 1 x e l l - 1 ] = = x e l l ] e l l + + )

    i f ( e l l = = m )

    e l l = 0

    / * S e a r c h i n g * /

    i = e l l

    j = k = 0

    w h i l e ( j < = n - m ) {

    w h i l e ( i < m & & x i ] = = y i + j ] )

    + + i

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    77/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    78/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    79/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    80/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    81/220

    1 2 N o t S o N a i v e a l g o r i t h m

    1 2 . 1 M a i n f e a t u r e s

    p r e p r o c e s s i n g p h a s e i n c o n s t a n t t i m e c o m p l e x i t y

    c o n s t a n t e x t r a s p a c e c o m p l e x i t y

    s e a r c h i n g p h a s e i n O ( m n ) t i m e c o m p l e x i t y

    ( s l i g h t l y ) s u b - l i n e a r i n t h e a v e r a g e c a s e .

    1 2 . 2 D e s c r i p t i o n

    D u r i n g t h e s e a r c h i n g p h a s e o f t h e N o t S o N a i v e a l g o r i t h m t h e c h a r a c t e r

    c o m p a r i s o n s a r e m a d e w i t h t h e p a t t e r n p o s i t i o n s i n t h e f o l l o w i n g o r d e r

    1 2 : : : m ; 2 m ; 1 0

    F o r e a c h a t t e m p t w h e r e t h e w i n d o w i s p o s i t i o n e d o n t h e t e x t f a c t o r

    y j : : j + m ; 1 i f x 0 ] = x 1 a n d x 1 6= y j + 1 o f i f x 0 6= x 1 a n d

    x 1 ] = y j + 1 t h e p a t t e r n i s s h i f t e d b y 2 p o s i t i o n s a t t h e e n d o f t h e

    a t t e m p t a n d b y 1 o t h e r w i s e .

    T h u s t h e p r e p r o c e s s i n g p h a s e c a n b e d o n e i n c o n s t a n t t i m e a n d s p a c e .

    T h e s e a r c h i n g p h a s e o f t h e N o t S o N a i v e a l g o r i t h m h a s a q u a d r a t i c w o r s t

    c a s e b u t i t i s s l i g h t l y s u b - l i n e a r i n t h e a v e r a g e c a s e .

    1 2 . 3 T h e C c o d e

    v o i d N S N ( c h a r * x , i n t m , c h a r * y , i n t n ) {

    i n t j , k , e l l

    / * P r e p r o c e s s i n g * /

    i f ( x 0 ] = = x 1 ] ) {

    k = 2

    e l l = 1

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    82/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    83/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    84/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    85/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    86/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    87/220

    1 3 F o r w a r d D a w g M a t c h i n g a l g o r i t h m

    1 3 . 1 M a i n F e a t u r e s

    u s e s t h e s u x a u t o m a t o n o f x

    O ( n ) w o r s t c a s e t i m e c o m p l e x i t y

    p e r f o r m s e x a c t l y n t e x t c h a r a c t e r i n s p e c t i o n s .

    1 3 . 2 D e s c r i p t i o n

    T h e F o r w a r d D a w g M a t c h i n g a l g o r i t h m c o m p u t e s t h e l o n g e s t f a c t o r o f

    t h e p a t t e r n e n d i n g a t e a c h p o s i t i o n i n t h e t e x t . T h i s i s m a k e p o s -

    s i b l e b y t h e u s e o f t h e s m a l l e s t s u x a u t o m a t o n ( a l s o c a l l e d D A W G

    f o r D i r e c t e d A c y c l i c W o r d G r a p h ) o f t h e p a t t e r n . T h e s m a l l e s t s u x

    a u t o m a t o n o f a w o r d w i s a D e t e r m i n i s t i c F i n i t e A u t o m a t o n S ( w ) =

    ( Q q

    0

    T E ) . T h e l a n g u a g e a c c e p t e d b y S ( w ) i s L ( S ( w ) ) = f u 2

    9 v 2

    s u c h t h a t w = v u g . T h e p r e p r o c e s s i n g p h a s e o f t h e F o r w a r d

    D a w g M a t c h i n g a l g o r i t h m c o n s i s t s i n c o m p u t i n g t h e s m a l l e s t s u x a u -

    t o m a t o n f o r t h e p a t t e r n x . I t i s l i n e a r i n t i m e a n d s p a c e i n t h e l e n g t h

    o f t h e p a t t e r n .

    D u r i n g t h e s e a r c h i n g p h a s e t h e F o r w a r d D a w g M a t c h i n g a l g o r i t h m

    p a r s e s t h e c h a r a c t e r s o f t h e t e x t f r o m l e f t t o r i g h t w i t h t h e a u t o m a t o n

    S ( x ) s t a r t i n g w i t h s t a t e q

    0

    . F o r e a c h s t a t e q 2 S ( x ) t h e l o n g e s t p a t h

    f r o m q

    0

    t o p i s d e n o t e d b y l e n g t h ( q ) . T h i s s t r u c t u r e e x t e n s i v e l y u s e s t h e

    n o t i o n o f s u x l i n k s . F o r e a c h s t a t e p t h e s u x l i n k o f p i s d e n o t e d b y

    S p . F o r a s t a t e p , l e t P a t h ( p ) = ( p

    0

    p

    1

    : : : p

    ) b e t h e s u x p a t h o f p

    s u c h t h a t p

    0

    = p , f o r 1 i p

    i

    = S p

    i 1

    a n d p

    = q

    0

    . F o r e a c h t e x t

    c h a r a c t e r y j s e q u e n t i a l l y , l e t p b e t h e c u r r e n t s t a t e , t h e n t h e F o r w a r d

    D a w g M a t c h i n g a l g o r i t h m t a k e s a t r a n s i t i o n d e n e d f o r y j f o r t h e r s t

    s t a t e o f P a t h ( p ) f o r w h i c h s u c h a t r a n s i t i o n i s d e n e d . T h e c u r r e n t s t a t e

    p i s u p d a t e d w i t h t h e t a r g e t s t a t e o f t h i s t r a n s i t i o n o r w i t h t h e i n i t i a l

    s t a t e q

    0

    i f n o t r a n s i t i o n e x i s t s l a b e l l e d w i t h y j f r o m a s t a t e o f P a t h ( p )

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    88/220

    8 8 C h a p t e r 1 3 F o r w a r d D a w g M a t c h i n g a l g o r i t h m

    A n o c c u r r e n c e o f x i s f o u n d w h e n l e n g t h ( p ) = m

    T h e F o r w a r d D a w g M a t c h i n g a l g o r i t h m p e r f o r m s e x a c t l y n t e x t c h a r -

    a c t e r i n s p e c t i o n s .

    1 3 . 3 T h e C c o d e

    T h e f u n c t i o n b u i l d S u f f i x A u t o m a t o n i s g i v e n c h a p t e r 2 5 . A l l t h e o t h e r

    f u n c t i o n s t o b u i l d a n d m a n i p u l a t e t h e s u x a u t o m a t o n c a n b e f o u n d

    s e c t i o n 1 . 5 .

    i n t F D M ( c h a r * x , i n t m , c h a r * y , i n t n ) {

    i n t j , i n i t , e l l , s t a t e

    G r a p h a u t

    / * P r e p r o c e s s i n g * /

    a u t = n e w S u f f i x A u t o m a t o n ( 2 * ( m + 2 ) , 2 * ( m + 2 ) * A S I Z E )

    b u i l d S u f f i x A u t o m a t o n ( x , m , a u t )

    i n i t = g e t I n i t i a l ( a u t )

    / * S e a r c h i n g * /

    e l l = 0

    s t a t e = i n i t

    f o r ( j = 0 j < n + + j ) {

    i f ( g e t T a r g e t ( a u t , s t a t e , y j ] ) ! = U N D E F I N E D ) {

    + + e l l

    s t a t e = g e t T a r g e t ( a u t , s t a t e , y j ] )

    }

    e l s e {

    w h i l e ( s t a t e ! = i n i t & &

    g e t T a r g e t ( a u t , s t a t e , y j ] ) = = U N D E F I N E D )

    s t a t e = g e t S u f f i x L i n k ( a u t , s t a t e )

    i f ( g e t T a r g e t ( a u t , s t a t e , y j ] ) ! = U N D E F I N E D ) {

    e l l = g e t L e n g t h ( a u t , s t a t e ) + 1

    s t a t e = g e t T a r g e t ( a u t , s t a t e , y j ] )

    }

    e l s e {

    e l l = 0

    s t a t e = i n i t

    }

    }

    i f ( e l l = = m )

    O U T P U T ( j - m + 1 )

    }

    }

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    89/220

    1 3 . 4 T h e e x a m p l e 8 9

    1 3 . 4 T h e e x a m p l e

    0 1 2 3 4 5

    6

    7

    8

    9

    1 0

    1 1

    1 2

    G C A G A G A G

    C

    A

    A

    G

    A

    G

    A

    s t a t e 0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2

    s u x l i n k 0 0 0 6 8 1 0 0 1 2 1 1 0 6 1 2 8

    l e n g t h 0 1 2 3 4 5 1 6 2 7 3 8 4

    S e a r c h i n g p h a s e

    T h e i n i t i a l s t a t e i s 0 .

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1 2 3 0

    y G C A T C G C A G A G A G T A T A C A G T A C G

    2 0

    y G C A T C G C A G A G A G T A T A C A G T A C G

    1 2 3 4 5 7 9 1 0

    1

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    90/220

  • 8/6/2019 6600089 Handbook of Exact String Matching Algorithms

    91/220

    1 4 B o y e r - M o o r e a l g o r i t h m

    1 4 . 1 M a i n F e a t u r e s

    p e r f o r m s t h e c o m p a r i s o n s f r o m r i g h t t o l e f t

    p r e p r o c e s s i n g p h a s e i n O ( m + ) t i m e a n d s p a c e c o m p l e x i t y

    s e a r c h i n g p h a s e i n O ( m n ) t i m e c o m p l e x i t y

    3 n t e x t c h a r a c t e r c o m p a r i s o n s i n t h e w o r s t c a s e w h e n s e a r c h i n g f o r

    a n o n p e r i o d i c p a t t e r n

    O ( n = m ) b e s t p e r f