1 morris-pratt algorithm advisor: prof. r. c. t. lee speaker: c. w. lu a linear pattern-matching...

42
1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California, Berkeley, 1970. Morris (Jr), J. H., Pratt, V. R.

Post on 21-Dec-2015

223 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

1

Morris-Pratt Algorithm

Advisor: Prof. R. C. T. Lee

Speaker: C. W. Lu

A linear pattern-matching algorithm, Technical Report 40, University of California, Berkeley, 1970.

Morris (Jr), J. H., Pratt, V. R.

Page 2: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

2

Morris-Pratt algorithm

We are given a text T and a pattern P to find all occurrences of P in Tand perform the comparisons from left to right.

n : the length of Tm : the length of P

Example

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

T A A A A A A T C A C A T T A G C A A A A

P A T C A C A G T A T C A1 2 3 4 5 6 7 8 9 10 11 12

Page 3: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

3

The basic principle of MP Algorithm is step by step comparison.

Initially, we compare T(1) with P(1). If T(1) ≠ P(1), we moveThe pattern one step towards the right.

Example

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

T A A A A A A T C A C A T T A G C A A A A

P C T C A C A G T A T C A1 2 3 4 5 6 7 8 9 10 11 12

P C T C A C A G T A T C A1 2 3 4 5 6 7 8 9 10 11 12

Page 4: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

4

Suppose the following condition occurs, should we move patternP only one step towards the right?

The answer is no in this case as we may use Rule 1, the suffix of T to prefix of P rule.

bT

aP

j i+j-1

i

1

1

j+m-1 n

m

Example1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

T A A A A A A T C A C A T T A G C A A A A

P A T C A C A G T A T C A1 2 3 4 5 6 7 8 9 10 11 12

Page 5: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

5

Rule 1: The Suffix of T to Prefix of P Rule

For a window to have any chance to match a pattern, in some way, there must be a suffix of the window which is equal to a prefix of the pattern.

T

P

Page 6: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

6

The Implication of Rule 1:

Find the longest suffix v of the window which is equal to some prefix of P. Skip the pattern as follows:

T

P

v

v

P v

Page 7: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

7

Now, we know that a prefix U of T is equal to a prefix U of P. Thus, instead of finding the longest suffix of T equal to a prefix of P, We may simply find the longest suffix of U of P which is equal to a prefix of P.

U bT

U aP v

Example1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

T A A A A A C A G A C A T T A G C A A A A

P C A G A C A G T A T C A1 2 3 4 5 6 7 8 9 10 11 12

Page 8: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

8

Example1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

T A A A A A C A G A C A T T A G C A A A A

P C A G A C A G T A T C A1 2 3 4 5 6 7 8 9 10 11 12

In this case, we can see the longest suffix of U which is equal to a prefix of P is CA.

Thus, we may apply Rule 1 to move P as follows:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

T A A A A A C A G A C A T T A G C A A A A

P C A G A C A G T A T C A1 2 3 4 5 6 7 8 9 10 11 12

Page 9: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

9

The MP Algorithm

Assume that we have already found the largest prefix of T which is equal to a prefix of P.

T

P

U

U a

b

Page 10: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

10

The MP Algorithm

Skip the pattern by using Rule 1.

T

P

v

v v a

b

c

T

P

v

v

b

c

Given a substring U of T which is equal to a prefix of P, how do we know the longest suffix of U which is equal to some prefix of P?We do this by pre-processing.

...

...

Page 11: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

11

Prefix Function• In preprocessing phase, we would construct a t

able, named Prefix.

• Definition– For location i, let j be the largest j, if it exists, such

that P(0,j-1) is a suffix of P(0,i), Prefix(i)=j.– If, for P(0,i), there is no prefix equal to a suffix, Pr

efix(i)=0.

Page 12: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

12

Example

• Note that, we move the pattern i-Prefix(i-1) steps when a mismatch occurs at location i.

5

b

6

c

7

b

8

a

9

e

10

b

11

c

12

b

1

b

2

c

3

b

4

a

1 2 3 4 0 1 2 30 0 1 0

P

13

a

4

i 14

b

15

c

16

b

5 6 7

a

8

0

Prefix

Page 13: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

13

How can we construct the Prefix Table Efficiently?

• To compute Prefix(i), we look at Prefix(i-1).• In the following example, since Prefix(11)=4, we kno

w that there exists a prefix of length 4 which is equal to a suffix with length 4 of P(0,11). Besides, P(4)=P(12). We may conclude that Prefix(12)=Prefix(11)+1=4+1=5.

5

a

6

c

7

c

8

g

9

a

10

g

11

c

12

a

1 2

g

3

c

4

aP a

i 0

a

1 0 0 0 1 2 3 40 0 1 50Prefix

Page 14: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

14

Another Case

• Consider the following example.

• Prefix(9)=4. But P(4)≠P(10).

• Can we conclude that Prefix(10)=0?

• No, we cannot.

5

a

6

g

7

c

8

g

9

c

10

g c

1 2

g

3

c

4

gP

i 0

c

0 0 1 2 3 4 ?0 1 20Prefix

Page 15: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

15

• There exists a shorter prefix with length 2 which is equal to a suffix of P(0, 9), and P(10)=P(2). We should conclude that Prefix(10)=2+1=3.

5

a

6

g

7

c

8

g

9

c

10

g c

1 2

g

3

c

4

gP

i 0

c

0 0 1 2 3 4 30 1 20Prefix

Page 16: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

16

• In other words, we may use the pointer idea expressed below:

• It may be necessary to examine P(0, j) to see whether there exists a prefix of P(0, j) equal to a suffix of P(0, j).

• Thus the Prefix function can be found recursively.

Y X

i-1j

Page 17: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

17

Construct the Prefix

Function f

f [0]=0

for ( i=1 ; i<m ; i++ ){

t = f (i-1); /*t is the value of f(i-1)*/

While(t>=0){

if ( P(i) = P(t) ) {

f [i] = t + 1;

break;

}

else{

if ( t != 0)

t = f [t-1]; /*recursive*/

else{

f [i] = 0;

break;

}

}

}

}

Page 18: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

18

t = f[i-1] = f[0] = 0;

∵P[1] = c ≠ P[t] = P[0] = b ∴f [1] = 0.

Example:

5

a

6

g

7

c

8

g

9

c

10

g c

1 2

g

3

c

4

gP

i 0

c

0Prefix

5

a

6

g

7

c

8

g

9

c

10

g c

1 2

g

3

c

4

gP

i 0

c

00Prefix

Page 19: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

19

t = f[i-1] = f[3] = 2;

P[4] = a ≠ P[t] = P[2] = c, and t != 0;

t = f[t-1] = f[1] = 0;

∵P[4] = a ≠ P[t] = P[2] = c ∴f [4] = 0.

5

a

6

g

7

c

8

g

9

c

10

g c

1 2

g

3

c

4

gP

i 0

c

0 10Prefix

5

a

6

g

7

c

8

g

9

c

10

g c

1 2

g

3

c

4

gP

i 0

c

00 1 20Prefix

t = f[i-1] = f[0] = 0;

∵P[2] = c = P[t] = P[0] = c ∴f [2] = t + 1 = 1.

Page 20: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

20

t = f[i-1] = f[9] = 4;

P[10] = c ≠ P[t] = P[4] = a, and t != 0;

t = f[t-1] = f[3] = 2;

∵P[10] = c = P[t] = P[2] = c ∴f [10] = t + 1= 3.

5

a

6

g

7

c

8

g

9

c

10

g c

1 2

g

3

c

4

gP

i 0

c

0 0 1 2 3 40 1 20Prefix

5

a

6

g

7

c

8

g

9

c

10

g c

1 2

g

3

c

4

gP

i 0

c

0 0 1 2 3 4 30 1 20Prefix

t = f[i-1] = f[8] = 3;

∵P[9] = g = P[t] = P[3] = g ∴f [9] = t + 1 = 4.

Page 21: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

21

Example0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

T a c g c c g c g a g c g c g c t c a a a

P c g c g a g c g c g c0 1 2 3 4 5 6 7 8 9 10

Shift by 1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

T a c g c c g c g a g c g c g c t c a a a

P c g c g a g c g c g c0 1 2 3 4 5 6 7 8 9 10

0 1 2 3 4 5 6 7 8 9 10 11 1 1 2 2 2 5 6 6 6 6 6 8

i

i - prefix(i-1)

Page 22: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

22

Example0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

T a c g c c g c g a g c g c g c t c a a a

P c g c g a g c g c g c0 1 2 3 4 5 6 7 8 9 10

Shift by 2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

T a c g c c g c g a g c g c g c t c a a a

P c g c g a g c g c g c0 1 2 3 4 5 6 7 8 9 10

0 1 2 3 4 5 6 7 8 9 10 11 1 1 2 2 2 5 6 6 6 6 6 8

i

i - prefix(i-1)

Page 23: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

23

Example0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

T a c g c c g c g a g c g c g c t c a a a

P c g c g a g c g c g c0 1 2 3 4 5 6 7 8 9 10

Shift by 1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

T a c g c c g c g a g c g c g c t c a a a

P c g c g a g c g c g c0 1 2 3 4 5 6 7 8 9 10

0 1 2 3 4 5 6 7 8 9 10 11 1 1 2 2 2 5 6 6 6 6 6 8

i

i - prefix(i-1)

Page 24: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

24

Example0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

T a c g c c g c g a g c g c g c t c a a a

P c g c g a g c g c g c0 1 2 3 4 5 6 7 8 9 10

Shift by 8

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

T a c g c c g c g a g c g c g c t c a a a

P c g c g a g c g c g c0 1 2 3 4 5 6 7 8 9 10

0 1 2 3 4 5 6 7 8 9 10 11 1 1 2 2 2 5 6 6 6 6 6 8

i

i - prefix(i-1)

Match!

Page 25: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

25

Time Complexity

preprocessing phase in O(m) space and time complexity

searching phase in O(n+m) time complexity

Page 26: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

26

References

AHO, A.V., HOPCROFT, J.E., ULLMAN, J.D., 1974, The design and analysis of computer algorithms, 2nd Edition, Chapter 9, pp. 317--361, Addison-Wesley Publishing Company.

BEAUQUIER, D., BERSTEL, J., CHRÉTIENNE, P., 1992, Éléments d'algorithmique, Chapter 10, pp 337-377, Masson, Paris.

CROCHEMORE, M., 1997. Off-line serial exact string searching, in Pattern Matching Algorithms, ed. A. Apostolico and Z. Galil, Chapter 1, pp 1-53, Oxford University Press.

HANCART, C., 1992, Une analyse en moyenne de l'algorithme de Morris et Pratt et de ses raffinements, in Théorie des Automates et Applications, Actes des 2e Journées Franco-Belges, D. Krob ed., Rouen, France, 1991, PUR 176, Rouen, France, 99-110.

HANCART, C., 1993. Analyse exacte et en moyenne d'algorithmes de recherche d'un motif dans un texte, Ph. D. Thesis, University Paris 7, France. MORRIS (Jr) J.H., PRATT V.R., 1970, A linear pattern-matching algorithm, Technical Report 40, University of California, Berkeley.

Page 27: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

27

Knuth-Morris-Pratt Algorithm

KNUTH D.E., MORRIS (Jr) J.H., PRATT V.R.,, Fast pattern matching in strings, SIAM Journal on Computing 6(1), 1977, pp.323-350.

Page 28: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

28

• In MP algorithm, it has two cases to move the pattern from left to right.

w y

w x

u u x

w y

P

u u x

Case 1: There exists a suffix of w which equals to a prefix of w.

Case 2: No suffix of w which equals to a prefix of w.

T

P

T

P

Page 29: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

29

• KMP algorithm improves the MP algorithm, and it has two cases.

w y

w x

u z u x

w y

P

u z u x

Case 1: There exists a suffix of w which equals to a prefix of w and z ≠ x.

Case 2: No suffix of w which equals to a prefix of w such that z ≠ x.

T

P

T

P

Page 30: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

30

5

b

6

c

7

b

8

a

9

e

10

b

11

c

12

b

1

b

2

c

3

b

4

a

-1 0 -1 1 4 -1 0 -1-1 0 -1 1

P

13

a

1

i 14

b

15

c

16

b

-1 0 -1

a

1

0

T:

P: b ac bb c b a e b ac bb c b a

b ac cb c b a e b ac bb c b a… …

Mismatch occurs at location 4 of P.

Move P (4 - KMPtable[4]) = 4 - (-1) = 5 steps.

b ac bb c b a e b ac bb c b a

Page 31: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

31

5

b

6

c

7

b

8

a

9

e

10

b

11

c

12

b

1

b

2

c

3

b

4

a

-1 0 -1 1 4 -1 0 -1-1 0 -1 1

P

13

a

1

i 14

b

15

c

16

b

-1 0 -1

a

1

0

T:

P: b ac bb c b a e b ac bb c b a

b ac bb a b a e b ac bb c b a… …

Mismatch occurs at location 4 of P.

Move P (5 - KMPtable[5]) = 5 - 0 = 5 steps.

b ac bb c b a e b ac bb c b a

Page 32: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

32

5

b

6

c

7

b

8

a

9

e

10

b

11

c

12

b

1

b

2

c

3

b

4

a

-1 0 -1 1 4 -1 0 -1-1 0 -1 1

P

13

a

1

i 14

b

15

c

16

b

-1 0 -1

a

1

0

T:

P: b ac bb c b a e b ac bb c b a

b ac bb c b a b b ac bb c b a… …

Mismatch occurs in position 8 of P.

Move P (8 - KMPtable[8]) = 8 - 4 = 4 steps.

b ac bb c b a e b ac bb c b a

Page 33: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

33

Example0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

T a c a c g a g c a c c g c g c t c a a a

P c a c g a g c a c g c0 1 2 3 4 5 6 7 8 9 10

c a c g a g c a c g c0 1 2 3 4 5 6 7 8 9 10

c a c g a g c a c g c0 1 2 3 4 5 6 7 8 9 10

(MP Algorithm)

(KMP Algorithm)

Page 34: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

34

Time Complexity

Preprocessing phase in O(m) space and time complexity.

Searching phase in O(n+m) time complexity.

Page 35: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

35

References • AHO, A.V., 1990, Algorithms for finding patterns in strings. in Handbook of Theoretical Computer Science, Volum

e A, Algorithms and complexity, J. van Leeuwen ed., Chapter 5, pp 255-300, Elsevier, Amsterdam. • AOE, J.-I., 1994, Computer algorithms: string pattern matching strategies, IEEE Computer Society Press. • BAASE, S., VAN GELDER, A., 1999, Computer Algorithms: Introduction to Design and Analysis, 3rd Edition, Ch

apter 11, Addison-Wesley Publishing Company. • BAEZA-YATES R., NAVARRO G., RIBEIRO-NETO B., 1999, Indexing and Searching, in Modern Information R

etrieval, Chapter 8, pp 191-228, Addison-Wesley. • BEAUQUIER, D., BERSTEL, J., CHRÉTIENNE, P., 1992, Éléments d'algorithmique, Chapter 10, pp 337-377, Ma

sson, Paris. • CORMEN, T.H., LEISERSON, C.E., RIVEST, R.L., 1990. Introduction to Algorithms, Chapter 34, pp 853-885, MI

T Press. • CROCHEMORE, M., 1997. Off-line serial exact string searching, in Pattern Matching Algorithms, ed. A. Apostoli

co and Z. Galil, Chapter 1, pp 1-53, Oxford University Press. • CROCHEMORE, M., HANCART, C., 1999, Pattern Matching in Strings, in Algorithms and Theory of Computatio

n Handbook, M.J. Atallah ed., Chapter 11, pp 11-1--11-28, CRC Press Inc., Boca Raton, FL. • CROCHEMORE, M., LECROQ, T., 1996, Pattern matching and text compression algorithms, in CRC Computer Sc

ience and Engineering Handbook, A. Tucker ed., Chapter 8, pp 162-202, CRC Press Inc., Boca Raton, FL. • CROCHEMORE, M., RYTTER, W., 1994, Text Algorithms, Oxford University Press. • GONNET, G.H., BAEZA-YATES, R.A., 1991. Handbook of Algorithms and Data Structures in Pascal and C, 2nd

Edition, Chapter 7, pp. 251-288, Addison-Wesley Publishing Company.

Page 36: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

36

References • GOODRICH, M.T., TAMASSIA, R., 1998, Data Structures and Algorithms in JAVA, Chapter 11, pp 441-467, John

Wiley & Sons. • GUSFIELD, D., 1997, Algorithms on strings, trees, and sequences: Computer Science and Computational Biology ,

Cambridge University Press. • HANCART, C., 1992, Une analyse en moyenne de l'algorithme de Morris et Pratt et de ses raffinements, in Théorie

des Automates et Applications, Actes des 2e Journées Franco-Belges, D. Krob ed., Rouen, France, 1991, PUR 176, Rouen, France, 99-110.

• HANCART, C., 1993. Analyse exacte et en moyenne d'algorithmes de recherche d'un motif dans un texte , Ph. D. Thesis, University Paris 7, France.

• KNUTH D.E., MORRIS (Jr) J.H., PRATT V.R., 1977, Fast pattern matching in strings, SIAM Journal on Computing 6(1):323-350.

• SEDGEWICK, R., 1988, Algorithms, Chapter 19, pp. 277-292, Addison-Wesley Publishing Company. • SEDGEWICK, R., 1988, Algorithms in C, Chapter 19, Addison-Wesley Publishing Company. • SEDGEWICK, R., FLAJOLET, P., 1996, An Introduction to the Analysis of Algorithms, Chapter ?, pp. ??-??, Addis

on-Wesley Publishing Company. • STEPHEN, G.A., 1994, String Searching Algorithms, World Scientific. • WATSON, B.W., 1995, Taxonomies and Toolkits of Regular Language Algorithms, Ph. D. Thesis, Eindhoven Univ

ersity of Technology, The Netherlands. • WIRTH, N., 1986, Algorithms & Data Structures, Chapter 1, pp. 17-72, Prentice-Hall.

Page 37: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

37

Simon Algorithm

String matching algorithms and automataSIMON I.

1st American Workshop on String Processing, pp 151-157(1993)

Page 38: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

38

• KMP algorithm improves the MP algorithm, and it has two cases.

w y

w x

u z u x

w y

P

u z u x

Case 1: There exists a suffix of w which equals to a prefix of w and z ≠ x.

Case 2: No suffix of w which equals to a prefix of w such that z ≠ x.

T

P

T

P

Page 39: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

39

• Simon algorithm improves KMP algorithm, and it has two cases.

w y

w x

u z u x

u z u x

w y

P

Case 1: There exists a suffix of w which equals to a prefix of w and z = y

Case 2: No suffix of w which equals to a prefix of w such that z = y.

T

P

T

P

Page 40: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

40

Example0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

T a c a c g a g c a c c g c g c t c a a a

P c a c g a g c a c g c0 1 2 3 4 5 6 7 8 9 10

c a c g a g c a c g c0 1 2 3 4 5 6 7 8 9 10

c a c g a g c a c g c0 1 2 3 4 5 6 7 8 9 10

(MP Algorithm)

(KMP Algorithm)

c a c g a g c a c g c0 1 2 3 4 5 6 7 8 9 10

(Simon Algorithm)

Page 41: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

41

References

• BEAUQUIER, D., BERSTEL, J., CHRÉTIENNE, P., 1992, Éléments d'algorithmique, Chapter 10, pp 337-377, Masson, Paris.

• CROCHEMORE, M., 1997. Off-line serial exact string searching, in Pattern Matching Algorithms, ed. A. Apostolico and Z. Galil, Chapter 1, pp 1-53, Oxford University Press.

• CROCHEMORE, M., HANCART, C., 1997. Automata for Matching Patterns, in Handbook of Formal Languages, Volume 2, Linear Modeling: Background and Application, G. Rozenberg and A. Salomaa ed., Chapter 9, pp 399-462, Springer-Verlag, Berlin.

• CROCHEMORE, M., RYTTER, W., 1994, Text Algorithms, Oxford University Press. • HANCART, C., 1992, Une analyse en moyenne de l'algorithme de Morris et Pratt et de ses raf

finements, in Théorie des Automates et Applications, Actes des 2e Journées Franco-Belges, D. Krob ed., Rouen, France, 1991, PUR 176, Rouen, France, 99-110.

• HANCART, C., 1993, On Simon's string searching algorithm, Inf. Process. Lett. 47(2):95-99. • HANCART, C., 1993. Analyse exacte et en moyenne d'algorithmes de recherche d'un motif da

ns un texte, Ph. D. Thesis, University Paris 7, France. • SIMON I., 1993, String matching algorithms and automata, in in Proceedings of 1st American

Workshop on String Processing, R.A. Baeza-Yates and N. Ziviani ed., pp 151-157, Universidade Federal de Minas Gerais, Brazil.

• SIMON, I., 1994, String matching algorithms and automata, in Results and Trends in Theoretical Computer Science, Graz, Austria, Karhumäki, Maurer and Rozenberg ed., pp 386-395, Lecture Notes in Computer Science 814, Springer Verlag.

Page 42: 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

42

Thanks for your attention.