compsci 220: automata and pattern matching

279
COMPSCI 220: Automata and Pattern Matching Cristian S. Calude Term 2 2009 COMPSCI 220: Automata and Pattern Matching 1 / 123

Upload: others

Post on 31-Jul-2022

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: COMPSCI 220: Automata and Pattern Matching

COMPSCI 220: Automata and Pattern Matching

Cristian S. Calude

Term 2 2009

COMPSCI 220: Automata and Pattern Matching 1 / 123

Page 2: COMPSCI 220: Automata and Pattern Matching

R. Hamming

“The purpose of computing is insight, not numbers.”

COMPSCI 220: Automata and Pattern Matching 2 / 123

Page 3: COMPSCI 220: Automata and Pattern Matching

Thanks to

Elena Calude, Michael Dinneen, Nick Hay, and Radu Nicolescu forstimulating discussions and critical comments.

COMPSCI 220: Automata and Pattern Matching 3 / 123

Page 4: COMPSCI 220: Automata and Pattern Matching

Outline

1 Finite machines

2 DFA

3 NFA

4 Minimisation of DFAs

5 Pattern matching

COMPSCI 220: Automata and Pattern Matching 4 / 123

Page 5: COMPSCI 220: Automata and Pattern Matching

Outline

Bibliography 3

T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein. Introductionto Algorithms, MIT Press, 2001 (second ed.)

M. Crochemore, W. Rytter. Jewels of Stringology, World Scientific,2002.

M. J. Dinneen, G. Gimel’farb, M. C. Wilson. Introduction toAlgorithms, Data Structures and Formal Languages, Prentice Hall,2009. (textbook)

JFLAP, http://www.jflap.org . (simulation software)

Regex simulator, http://osteele.com/tools/reanimator .

Animation of the Aho-Corasick Automaton,http://www-sr.informatik.uni-tuebingen.de/ ˜ buehler/AC/AC.ht

COMPSCI 220: Automata and Pattern Matching 5 / 123

Page 6: COMPSCI 220: Automata and Pattern Matching

Outline

Bibliography 3

T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein. Introductionto Algorithms, MIT Press, 2001 (second ed.)

M. Crochemore, W. Rytter. Jewels of Stringology, World Scientific,2002.

M. J. Dinneen, G. Gimel’farb, M. C. Wilson. Introduction toAlgorithms, Data Structures and Formal Languages, Prentice Hall,2009. (textbook)

JFLAP, http://www.jflap.org . (simulation software)

Regex simulator, http://osteele.com/tools/reanimator .

Animation of the Aho-Corasick Automaton,http://www-sr.informatik.uni-tuebingen.de/ ˜ buehler/AC/AC.ht

COMPSCI 220: Automata and Pattern Matching 5 / 123

Page 7: COMPSCI 220: Automata and Pattern Matching

Outline

Bibliography 3

T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein. Introductionto Algorithms, MIT Press, 2001 (second ed.)

M. Crochemore, W. Rytter. Jewels of Stringology, World Scientific,2002.

M. J. Dinneen, G. Gimel’farb, M. C. Wilson. Introduction toAlgorithms, Data Structures and Formal Languages, Prentice Hall,2009. (textbook)

JFLAP, http://www.jflap.org . (simulation software)

Regex simulator, http://osteele.com/tools/reanimator .

Animation of the Aho-Corasick Automaton,http://www-sr.informatik.uni-tuebingen.de/ ˜ buehler/AC/AC.ht

COMPSCI 220: Automata and Pattern Matching 5 / 123

Page 8: COMPSCI 220: Automata and Pattern Matching

Outline

Bibliography 3

T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein. Introductionto Algorithms, MIT Press, 2001 (second ed.)

M. Crochemore, W. Rytter. Jewels of Stringology, World Scientific,2002.

M. J. Dinneen, G. Gimel’farb, M. C. Wilson. Introduction toAlgorithms, Data Structures and Formal Languages, Prentice Hall,2009. (textbook)

JFLAP, http://www.jflap.org . (simulation software)

Regex simulator, http://osteele.com/tools/reanimator .

Animation of the Aho-Corasick Automaton,http://www-sr.informatik.uni-tuebingen.de/ ˜ buehler/AC/AC.ht

COMPSCI 220: Automata and Pattern Matching 5 / 123

Page 9: COMPSCI 220: Automata and Pattern Matching

Outline

Bibliography 3

T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein. Introductionto Algorithms, MIT Press, 2001 (second ed.)

M. Crochemore, W. Rytter. Jewels of Stringology, World Scientific,2002.

M. J. Dinneen, G. Gimel’farb, M. C. Wilson. Introduction toAlgorithms, Data Structures and Formal Languages, Prentice Hall,2009. (textbook)

JFLAP, http://www.jflap.org . (simulation software)

Regex simulator, http://osteele.com/tools/reanimator .

Animation of the Aho-Corasick Automaton,http://www-sr.informatik.uni-tuebingen.de/ ˜ buehler/AC/AC.ht

COMPSCI 220: Automata and Pattern Matching 5 / 123

Page 10: COMPSCI 220: Automata and Pattern Matching

Outline

Bibliography 3

T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein. Introductionto Algorithms, MIT Press, 2001 (second ed.)

M. Crochemore, W. Rytter. Jewels of Stringology, World Scientific,2002.

M. J. Dinneen, G. Gimel’farb, M. C. Wilson. Introduction toAlgorithms, Data Structures and Formal Languages, Prentice Hall,2009. (textbook)

JFLAP, http://www.jflap.org . (simulation software)

Regex simulator, http://osteele.com/tools/reanimator .

Animation of the Aho-Corasick Automaton,http://www-sr.informatik.uni-tuebingen.de/ ˜ buehler/AC/AC.ht

COMPSCI 220: Automata and Pattern Matching 5 / 123

Page 11: COMPSCI 220: Automata and Pattern Matching

Outline

Bibliography 3

T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein. Introductionto Algorithms, MIT Press, 2001 (second ed.)

M. Crochemore, W. Rytter. Jewels of Stringology, World Scientific,2002.

M. J. Dinneen, G. Gimel’farb, M. C. Wilson. Introduction toAlgorithms, Data Structures and Formal Languages, Prentice Hall,2009. (textbook)

JFLAP, http://www.jflap.org . (simulation software)

Regex simulator, http://osteele.com/tools/reanimator .

Animation of the Aho-Corasick Automaton,http://www-sr.informatik.uni-tuebingen.de/ ˜ buehler/AC/AC.ht

COMPSCI 220: Automata and Pattern Matching 5 / 123

Page 12: COMPSCI 220: Automata and Pattern Matching

Outline

Bibliography 3

T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein. Introductionto Algorithms, MIT Press, 2001 (second ed.)

M. Crochemore, W. Rytter. Jewels of Stringology, World Scientific,2002.

M. J. Dinneen, G. Gimel’farb, M. C. Wilson. Introduction toAlgorithms, Data Structures and Formal Languages, Prentice Hall,2009. (textbook)

JFLAP, http://www.jflap.org . (simulation software)

Regex simulator, http://osteele.com/tools/reanimator .

Animation of the Aho-Corasick Automaton,http://www-sr.informatik.uni-tuebingen.de/ ˜ buehler/AC/AC.ht

COMPSCI 220: Automata and Pattern Matching 5 / 123

Page 13: COMPSCI 220: Automata and Pattern Matching

Outline

Assignment and exam 4

Assignment 1: 21 August 2009; 8:30pm (ADB time)

Midterm test 18 August (in class): prepare all results discussed inclass and tutorials

COMPSCI 220: Automata and Pattern Matching 6 / 123

Page 14: COMPSCI 220: Automata and Pattern Matching

Outline

Assignment and exam 4

Assignment 1: 21 August 2009; 8:30pm (ADB time)

Midterm test 18 August (in class): prepare all results discussed inclass and tutorials

COMPSCI 220: Automata and Pattern Matching 6 / 123

Page 15: COMPSCI 220: Automata and Pattern Matching

Outline

Assignment and exam 4

Assignment 1: 21 August 2009; 8:30pm (ADB time)

Midterm test 18 August (in class): prepare all results discussed inclass and tutorials

COMPSCI 220: Automata and Pattern Matching 6 / 123

Page 16: COMPSCI 220: Automata and Pattern Matching

Finite machines

Question

Are there finite memory machines accepting as input finite binarysequences of any length and deciding whether the sequence has acertain property (for example, it has an even number of 0’s)?

Using “states” to remember the ‘property’ seems a good idea, but don’twe have to keep adding newer and newer ‘states’ as the input getslonger and longer?

Re-phrased: Is a finite memory enough? In general the answer seemsto be negative, but . . .

COMPSCI 220: Automata and Pattern Matching 7 / 123

Page 17: COMPSCI 220: Automata and Pattern Matching

Finite machines

Question

Are there finite memory machines accepting as input finite binarysequences of any length and deciding whether the sequence has acertain property (for example, it has an even number of 0’s)?

Using “states” to remember the ‘property’ seems a good idea, but don’twe have to keep adding newer and newer ‘states’ as the input getslonger and longer?

Re-phrased: Is a finite memory enough? In general the answer seemsto be negative, but . . .

COMPSCI 220: Automata and Pattern Matching 7 / 123

Page 18: COMPSCI 220: Automata and Pattern Matching

Finite machines

Question

Are there finite memory machines accepting as input finite binarysequences of any length and deciding whether the sequence has acertain property (for example, it has an even number of 0’s)?

Using “states” to remember the ‘property’ seems a good idea, but don’twe have to keep adding newer and newer ‘states’ as the input getslonger and longer?

Re-phrased: Is a finite memory enough? In general the answer seemsto be negative, but . . .

COMPSCI 220: Automata and Pattern Matching 7 / 123

Page 19: COMPSCI 220: Automata and Pattern Matching

Finite machines

A simple example

Probably the simplest finite machine operates a switch as follows:

So, if the switch is down, then the light goes on and if the switch is up,then the light goes off.

To this device, the switch position is an input and the light on/off is theoutput. The machine works with finitely many “states” for anysequence of modifications of the switch.

COMPSCI 220: Automata and Pattern Matching 8 / 123

Page 20: COMPSCI 220: Automata and Pattern Matching

Finite machines

A simple example

Probably the simplest finite machine operates a switch as follows:

So, if the switch is down, then the light goes on and if the switch is up,then the light goes off.

To this device, the switch position is an input and the light on/off is theoutput. The machine works with finitely many “states” for anysequence of modifications of the switch.

COMPSCI 220: Automata and Pattern Matching 8 / 123

Page 21: COMPSCI 220: Automata and Pattern Matching

DFA

Bits, bit-strings, alphabets

Bit strings: ε (empty string), 0, 1, 00, 01, 10, 11, . . .

B = {0, 1} and B∗ is the set of all binary strings

Strings can be concatenated: from x and y get xyThe length of the string x is denoted by |x |

◮ |ε| = 0, |0| = |1| = 1, |00| = |01| = |10| = |11| = 2, . . .◮ |xy | = |x | + |y |

Other alphabets: Σ = {a, b, c, d}, the set of 7-bit ASCII characters

Σ∗ is the set of all strings over the alphabet Σ

COMPSCI 220: Automata and Pattern Matching 9 / 123

Page 22: COMPSCI 220: Automata and Pattern Matching

DFA

Bits, bit-strings, alphabets

Bit strings: ε (empty string), 0, 1, 00, 01, 10, 11, . . .

B = {0, 1} and B∗ is the set of all binary strings

Strings can be concatenated: from x and y get xyThe length of the string x is denoted by |x |

◮ |ε| = 0, |0| = |1| = 1, |00| = |01| = |10| = |11| = 2, . . .◮ |xy | = |x | + |y |

Other alphabets: Σ = {a, b, c, d}, the set of 7-bit ASCII characters

Σ∗ is the set of all strings over the alphabet Σ

COMPSCI 220: Automata and Pattern Matching 9 / 123

Page 23: COMPSCI 220: Automata and Pattern Matching

DFA

Bits, bit-strings, alphabets

Bit strings: ε (empty string), 0, 1, 00, 01, 10, 11, . . .

B = {0, 1} and B∗ is the set of all binary strings

Strings can be concatenated: from x and y get xyThe length of the string x is denoted by |x |

◮ |ε| = 0, |0| = |1| = 1, |00| = |01| = |10| = |11| = 2, . . .◮ |xy | = |x | + |y |

Other alphabets: Σ = {a, b, c, d}, the set of 7-bit ASCII characters

Σ∗ is the set of all strings over the alphabet Σ

COMPSCI 220: Automata and Pattern Matching 9 / 123

Page 24: COMPSCI 220: Automata and Pattern Matching

DFA

Bits, bit-strings, alphabets

Bit strings: ε (empty string), 0, 1, 00, 01, 10, 11, . . .

B = {0, 1} and B∗ is the set of all binary strings

Strings can be concatenated: from x and y get xyThe length of the string x is denoted by |x |

◮ |ε| = 0, |0| = |1| = 1, |00| = |01| = |10| = |11| = 2, . . .◮ |xy | = |x | + |y |

Other alphabets: Σ = {a, b, c, d}, the set of 7-bit ASCII characters

Σ∗ is the set of all strings over the alphabet Σ

COMPSCI 220: Automata and Pattern Matching 9 / 123

Page 25: COMPSCI 220: Automata and Pattern Matching

DFA

Bits, bit-strings, alphabets

Bit strings: ε (empty string), 0, 1, 00, 01, 10, 11, . . .

B = {0, 1} and B∗ is the set of all binary strings

Strings can be concatenated: from x and y get xyThe length of the string x is denoted by |x |

◮ |ε| = 0, |0| = |1| = 1, |00| = |01| = |10| = |11| = 2, . . .◮ |xy | = |x | + |y |

Other alphabets: Σ = {a, b, c, d}, the set of 7-bit ASCII characters

Σ∗ is the set of all strings over the alphabet Σ

COMPSCI 220: Automata and Pattern Matching 9 / 123

Page 26: COMPSCI 220: Automata and Pattern Matching

DFA

Bits, bit-strings, alphabets

Bit strings: ε (empty string), 0, 1, 00, 01, 10, 11, . . .

B = {0, 1} and B∗ is the set of all binary strings

Strings can be concatenated: from x and y get xyThe length of the string x is denoted by |x |

◮ |ε| = 0, |0| = |1| = 1, |00| = |01| = |10| = |11| = 2, . . .◮ |xy | = |x | + |y |

Other alphabets: Σ = {a, b, c, d}, the set of 7-bit ASCII characters

Σ∗ is the set of all strings over the alphabet Σ

COMPSCI 220: Automata and Pattern Matching 9 / 123

Page 27: COMPSCI 220: Automata and Pattern Matching

DFA

Bits, bit-strings, alphabets

Bit strings: ε (empty string), 0, 1, 00, 01, 10, 11, . . .

B = {0, 1} and B∗ is the set of all binary strings

Strings can be concatenated: from x and y get xyThe length of the string x is denoted by |x |

◮ |ε| = 0, |0| = |1| = 1, |00| = |01| = |10| = |11| = 2, . . .◮ |xy | = |x | + |y |

Other alphabets: Σ = {a, b, c, d}, the set of 7-bit ASCII characters

Σ∗ is the set of all strings over the alphabet Σ

COMPSCI 220: Automata and Pattern Matching 9 / 123

Page 28: COMPSCI 220: Automata and Pattern Matching

DFA

Bits, bit-strings, alphabets

Bit strings: ε (empty string), 0, 1, 00, 01, 10, 11, . . .

B = {0, 1} and B∗ is the set of all binary strings

Strings can be concatenated: from x and y get xyThe length of the string x is denoted by |x |

◮ |ε| = 0, |0| = |1| = 1, |00| = |01| = |10| = |11| = 2, . . .◮ |xy | = |x | + |y |

Other alphabets: Σ = {a, b, c, d}, the set of 7-bit ASCII characters

Σ∗ is the set of all strings over the alphabet Σ

COMPSCI 220: Automata and Pattern Matching 9 / 123

Page 29: COMPSCI 220: Automata and Pattern Matching

DFA

Bits, bit-strings, alphabets

Bit strings: ε (empty string), 0, 1, 00, 01, 10, 11, . . .

B = {0, 1} and B∗ is the set of all binary strings

Strings can be concatenated: from x and y get xyThe length of the string x is denoted by |x |

◮ |ε| = 0, |0| = |1| = 1, |00| = |01| = |10| = |11| = 2, . . .◮ |xy | = |x | + |y |

Other alphabets: Σ = {a, b, c, d}, the set of 7-bit ASCII characters

Σ∗ is the set of all strings over the alphabet Σ

COMPSCI 220: Automata and Pattern Matching 9 / 123

Page 30: COMPSCI 220: Automata and Pattern Matching

DFA

Bits, bit-strings, alphabets

Bit strings: ε (empty string), 0, 1, 00, 01, 10, 11, . . .

B = {0, 1} and B∗ is the set of all binary strings

Strings can be concatenated: from x and y get xyThe length of the string x is denoted by |x |

◮ |ε| = 0, |0| = |1| = 1, |00| = |01| = |10| = |11| = 2, . . .◮ |xy | = |x | + |y |

Other alphabets: Σ = {a, b, c, d}, the set of 7-bit ASCII characters

Σ∗ is the set of all strings over the alphabet Σ

COMPSCI 220: Automata and Pattern Matching 9 / 123

Page 31: COMPSCI 220: Automata and Pattern Matching

DFA

Deterministic finite automata

A deterministic finite automaton (DFA, for short) is a five-tupleM = (Q,Σ, δ, s, F ) where

1 Q is the finite set of machine states2 Σ is the finite input alphabet3 δ is a transition function from Q × Σ to Q4 s ∈ Q is the start state5 F ⊆ Q is the accepting (final/membership) states.

COMPSCI 220: Automata and Pattern Matching 10 / 123

Page 32: COMPSCI 220: Automata and Pattern Matching

DFA

Deterministic finite automata

A deterministic finite automaton (DFA, for short) is a five-tupleM = (Q,Σ, δ, s, F ) where

1 Q is the finite set of machine states2 Σ is the finite input alphabet3 δ is a transition function from Q × Σ to Q4 s ∈ Q is the start state5 F ⊆ Q is the accepting (final/membership) states.

COMPSCI 220: Automata and Pattern Matching 10 / 123

Page 33: COMPSCI 220: Automata and Pattern Matching

DFA

Deterministic finite automata

A deterministic finite automaton (DFA, for short) is a five-tupleM = (Q,Σ, δ, s, F ) where

1 Q is the finite set of machine states2 Σ is the finite input alphabet3 δ is a transition function from Q × Σ to Q4 s ∈ Q is the start state5 F ⊆ Q is the accepting (final/membership) states.

COMPSCI 220: Automata and Pattern Matching 10 / 123

Page 34: COMPSCI 220: Automata and Pattern Matching

DFA

Deterministic finite automata

A deterministic finite automaton (DFA, for short) is a five-tupleM = (Q,Σ, δ, s, F ) where

1 Q is the finite set of machine states2 Σ is the finite input alphabet3 δ is a transition function from Q × Σ to Q4 s ∈ Q is the start state5 F ⊆ Q is the accepting (final/membership) states.

COMPSCI 220: Automata and Pattern Matching 10 / 123

Page 35: COMPSCI 220: Automata and Pattern Matching

DFA

Deterministic finite automata

A deterministic finite automaton (DFA, for short) is a five-tupleM = (Q,Σ, δ, s, F ) where

1 Q is the finite set of machine states2 Σ is the finite input alphabet3 δ is a transition function from Q × Σ to Q4 s ∈ Q is the start state5 F ⊆ Q is the accepting (final/membership) states.

COMPSCI 220: Automata and Pattern Matching 10 / 123

Page 36: COMPSCI 220: Automata and Pattern Matching

DFA

Deterministic finite automata

A deterministic finite automaton (DFA, for short) is a five-tupleM = (Q,Σ, δ, s, F ) where

1 Q is the finite set of machine states2 Σ is the finite input alphabet3 δ is a transition function from Q × Σ to Q4 s ∈ Q is the start state5 F ⊆ Q is the accepting (final/membership) states.

COMPSCI 220: Automata and Pattern Matching 10 / 123

Page 37: COMPSCI 220: Automata and Pattern Matching

DFA

DFA: example 1

M = (Q,Σ, δ, s, F ):

Q = {q0, q1, q2, q3}Σ = {a, b}

δ ΣQ a bq0 q1 q2

q1 q0 q3

q2 q3 q0

q3 q2 q1

s = q0

F = {q2}

q0

q1

q2

q3

a

a

b

b

b

b

a

a

COMPSCI 220: Automata and Pattern Matching 11 / 123

Page 38: COMPSCI 220: Automata and Pattern Matching

DFA

DFA: accepted strings and language

Let M = (Q,Σ, δ, s, F ) be a DFA and w = w1w2 · · ·wn be a string overΣ.

The trace (path) of the computation of w on M is the (unique)sequence of states

s1, s2, · · · , sn, sn+1such that

s1 = s, δ(s1, w1) = s2, . . . , δ(sn−1, wn−1) = sn, δ(sn, wn) = sn+1.

The string w is accepted (or recognised) by M if sn+1 ∈ F ;otherwise, w is rejected by M.

The language accepted by M, denoted by L(M), is the set of allaccepted strings by M; if A = L(M), for some DFA M, then A iscalled regular.

COMPSCI 220: Automata and Pattern Matching 12 / 123

Page 39: COMPSCI 220: Automata and Pattern Matching

DFA

DFA: accepted strings and language

Let M = (Q,Σ, δ, s, F ) be a DFA and w = w1w2 · · ·wn be a string overΣ.

The trace (path) of the computation of w on M is the (unique)sequence of states

s1, s2, · · · , sn, sn+1such that

s1 = s, δ(s1, w1) = s2, . . . , δ(sn−1, wn−1) = sn, δ(sn, wn) = sn+1.

The string w is accepted (or recognised) by M if sn+1 ∈ F ;otherwise, w is rejected by M.

The language accepted by M, denoted by L(M), is the set of allaccepted strings by M; if A = L(M), for some DFA M, then A iscalled regular.

COMPSCI 220: Automata and Pattern Matching 12 / 123

Page 40: COMPSCI 220: Automata and Pattern Matching

DFA

DFA: accepted strings and language

Let M = (Q,Σ, δ, s, F ) be a DFA and w = w1w2 · · ·wn be a string overΣ.

The trace (path) of the computation of w on M is the (unique)sequence of states

s1, s2, · · · , sn, sn+1such that

s1 = s, δ(s1, w1) = s2, . . . , δ(sn−1, wn−1) = sn, δ(sn, wn) = sn+1.

The string w is accepted (or recognised) by M if sn+1 ∈ F ;otherwise, w is rejected by M.

The language accepted by M, denoted by L(M), is the set of allaccepted strings by M; if A = L(M), for some DFA M, then A iscalled regular.

COMPSCI 220: Automata and Pattern Matching 12 / 123

Page 41: COMPSCI 220: Automata and Pattern Matching

DFA

DFA: accepted strings and language

Let M = (Q,Σ, δ, s, F ) be a DFA and w = w1w2 · · ·wn be a string overΣ.

The trace (path) of the computation of w on M is the (unique)sequence of states

s1, s2, · · · , sn, sn+1such that

s1 = s, δ(s1, w1) = s2, . . . , δ(sn−1, wn−1) = sn, δ(sn, wn) = sn+1.

The string w is accepted (or recognised) by M if sn+1 ∈ F ;otherwise, w is rejected by M.

The language accepted by M, denoted by L(M), is the set of allaccepted strings by M; if A = L(M), for some DFA M, then A iscalled regular.

COMPSCI 220: Automata and Pattern Matching 12 / 123

Page 42: COMPSCI 220: Automata and Pattern Matching

DFA

DFA: accepted strings and language

Let M = (Q,Σ, δ, s, F ) be a DFA and w = w1w2 · · ·wn be a string overΣ.

The trace (path) of the computation of w on M is the (unique)sequence of states

s1, s2, · · · , sn, sn+1such that

s1 = s, δ(s1, w1) = s2, . . . , δ(sn−1, wn−1) = sn, δ(sn, wn) = sn+1.

The string w is accepted (or recognised) by M if sn+1 ∈ F ;otherwise, w is rejected by M.

The language accepted by M, denoted by L(M), is the set of allaccepted strings by M; if A = L(M), for some DFA M, then A iscalled regular.

COMPSCI 220: Automata and Pattern Matching 12 / 123

Page 43: COMPSCI 220: Automata and Pattern Matching

DFA

Questions

Given a DFA M, check which strings M accepts.

Given a language (set of strings) can we build a DFA M thatrecognises just them? If the answer is affirmative can weconstruct a minimal (in the sense of the number of states) DFArecognising the language?

Which properties of DFAs can be checked algorithmically?

COMPSCI 220: Automata and Pattern Matching 13 / 123

Page 44: COMPSCI 220: Automata and Pattern Matching

DFA

Questions

Given a DFA M, check which strings M accepts.

Given a language (set of strings) can we build a DFA M thatrecognises just them? If the answer is affirmative can weconstruct a minimal (in the sense of the number of states) DFArecognising the language?

Which properties of DFAs can be checked algorithmically?

COMPSCI 220: Automata and Pattern Matching 13 / 123

Page 45: COMPSCI 220: Automata and Pattern Matching

DFA

Questions

Given a DFA M, check which strings M accepts.

Given a language (set of strings) can we build a DFA M thatrecognises just them? If the answer is affirmative can weconstruct a minimal (in the sense of the number of states) DFArecognising the language?

Which properties of DFAs can be checked algorithmically?

COMPSCI 220: Automata and Pattern Matching 13 / 123

Page 46: COMPSCI 220: Automata and Pattern Matching

DFA

Questions

Given a DFA M, check which strings M accepts.

Given a language (set of strings) can we build a DFA M thatrecognises just them? If the answer is affirmative can weconstruct a minimal (in the sense of the number of states) DFArecognising the language?

Which properties of DFAs can be checked algorithmically?

COMPSCI 220: Automata and Pattern Matching 13 / 123

Page 47: COMPSCI 220: Automata and Pattern Matching

DFA

DFA: example 2

The language accepted bythis DFA is empty, i.e. theDFA accepts no string.

q0

a, b

COMPSCI 220: Automata and Pattern Matching 14 / 123

Page 48: COMPSCI 220: Automata and Pattern Matching

DFA

DFA: example 3

The language accepted bythis DFA consists of all stringsover Σ = {a, b}, i.e. the lan-guage Σ∗ = {a, b}∗.

q0

a, b

COMPSCI 220: Automata and Pattern Matching 15 / 123

Page 49: COMPSCI 220: Automata and Pattern Matching

DFA

DFA: example 1 continued

The language ac-cepted by thisDFA consists ofall strings overΣ = {a, b} whichcontain an evennumber of a’s andan odd number ofb’s.

q0

q1

q2

q3

a

a

b

b

b

b

a

a

COMPSCI 220: Automata and Pattern Matching 16 / 123

Page 50: COMPSCI 220: Automata and Pattern Matching

DFA

DFA: example 4

The language ac-cepted by this DFAconsists of all stringsover Σ = {a, b}which contain thesubstring aba, i.e.all the strings ofthe form uabav withu, v ∈ {a, b}∗.

q0

b

q1

a

q2

q3

a, ba

b

b

a

COMPSCI 220: Automata and Pattern Matching 17 / 123

Page 51: COMPSCI 220: Automata and Pattern Matching

DFA

DFA: example 5

The language ac-cepted by this DFAconsists of all stringsover Σ = {a, b}which start with a,i.e. all the strings ofthe form av , with v ∈Σ∗ = {a, b}∗.

q0

q1

a, b

q2

a, b

a

b

COMPSCI 220: Automata and Pattern Matching 18 / 123

Page 52: COMPSCI 220: Automata and Pattern Matching

DFA

DFA: example 6

The languageaccepted bythis DFA con-sists of onlyone string overΣ = {a, b},namelyabbab.

q0 q1 q2 q3 q4 q5

q6

a b b a b

ba

a b a a, b

a, b

COMPSCI 220: Automata and Pattern Matching 19 / 123

Page 53: COMPSCI 220: Automata and Pattern Matching

DFA

DFA: example 7

The language acceptedby this DFA is{ambn | m, n > 0},where am meansaa · · · a (m times).

q0 q1 q2

q3

a b

ab

a b

a, b

COMPSCI 220: Automata and Pattern Matching 20 / 123

Page 54: COMPSCI 220: Automata and Pattern Matching

DFA

Not all languages are accepted by DFAs

The languageL = {anbn | n > 0}

is not accepted by any DFA.

Why?

Informally, because a DFA can ‘count’ only up to the number of itsstates.

More formally, because, if n is greater than the number of states of aDFA supposed to accept L, then any trace (path) labelled by an passestwice through some state. That is, the there are strings ai and aj fori < j ≤ n that fall into the same state. Thus both aibi and ajbi areaccepted/rejected which contradicts the definition of L.

COMPSCI 220: Automata and Pattern Matching 21 / 123

Page 55: COMPSCI 220: Automata and Pattern Matching

DFA

Simple properties of DFAs 1

The complement of a regular language is also regular.Proof: if A = L(M), where M = (Q,Σ, δ, s, F ), then itscomplement, A = L(M ′), where M ′ = (Q,Σ, δ, s, F ).

It is algorithmically decidable whether a DFA M accepts the emptystring.Proof: If M = (Q,Σ, δ, s, F ), then ε ∈ L(M) if and only if s ∈ F .

It is algorithmically decidable whether a DFA M accepts a string w .Proof: Construct the trace of the computation of w on M andcheck whether its last state is final.

COMPSCI 220: Automata and Pattern Matching 22 / 123

Page 56: COMPSCI 220: Automata and Pattern Matching

DFA

Simple properties of DFAs 1

The complement of a regular language is also regular.Proof: if A = L(M), where M = (Q,Σ, δ, s, F ), then itscomplement, A = L(M ′), where M ′ = (Q,Σ, δ, s, F ).

It is algorithmically decidable whether a DFA M accepts the emptystring.Proof: If M = (Q,Σ, δ, s, F ), then ε ∈ L(M) if and only if s ∈ F .

It is algorithmically decidable whether a DFA M accepts a string w .Proof: Construct the trace of the computation of w on M andcheck whether its last state is final.

COMPSCI 220: Automata and Pattern Matching 22 / 123

Page 57: COMPSCI 220: Automata and Pattern Matching

DFA

Simple properties of DFAs 1

The complement of a regular language is also regular.Proof: if A = L(M), where M = (Q,Σ, δ, s, F ), then itscomplement, A = L(M ′), where M ′ = (Q,Σ, δ, s, F ).

It is algorithmically decidable whether a DFA M accepts the emptystring.Proof: If M = (Q,Σ, δ, s, F ), then ε ∈ L(M) if and only if s ∈ F .

It is algorithmically decidable whether a DFA M accepts a string w .Proof: Construct the trace of the computation of w on M andcheck whether its last state is final.

COMPSCI 220: Automata and Pattern Matching 22 / 123

Page 58: COMPSCI 220: Automata and Pattern Matching

DFA

Simple properties of DFAs 1

The complement of a regular language is also regular.Proof: if A = L(M), where M = (Q,Σ, δ, s, F ), then itscomplement, A = L(M ′), where M ′ = (Q,Σ, δ, s, F ).

It is algorithmically decidable whether a DFA M accepts the emptystring.Proof: If M = (Q,Σ, δ, s, F ), then ε ∈ L(M) if and only if s ∈ F .

It is algorithmically decidable whether a DFA M accepts a string w .Proof: Construct the trace of the computation of w on M andcheck whether its last state is final.

COMPSCI 220: Automata and Pattern Matching 22 / 123

Page 59: COMPSCI 220: Automata and Pattern Matching

DFA

Simple properties of DFAs 1

The complement of a regular language is also regular.Proof: if A = L(M), where M = (Q,Σ, δ, s, F ), then itscomplement, A = L(M ′), where M ′ = (Q,Σ, δ, s, F ).

It is algorithmically decidable whether a DFA M accepts the emptystring.Proof: If M = (Q,Σ, δ, s, F ), then ε ∈ L(M) if and only if s ∈ F .

It is algorithmically decidable whether a DFA M accepts a string w .Proof: Construct the trace of the computation of w on M andcheck whether its last state is final.

COMPSCI 220: Automata and Pattern Matching 22 / 123

Page 60: COMPSCI 220: Automata and Pattern Matching

DFA

Simple properties of DFAs 1

The complement of a regular language is also regular.Proof: if A = L(M), where M = (Q,Σ, δ, s, F ), then itscomplement, A = L(M ′), where M ′ = (Q,Σ, δ, s, F ).

It is algorithmically decidable whether a DFA M accepts the emptystring.Proof: If M = (Q,Σ, δ, s, F ), then ε ∈ L(M) if and only if s ∈ F .

It is algorithmically decidable whether a DFA M accepts a string w .Proof: Construct the trace of the computation of w on M andcheck whether its last state is final.

COMPSCI 220: Automata and Pattern Matching 22 / 123

Page 61: COMPSCI 220: Automata and Pattern Matching

DFA

Simple properties of DFAs 2

It is algorithmically decidable whether a DFA M accepts no string.Proof: Given the DFA M check whether there is a path from theinitial state s (has a trace of a computation) to a final state in F .We have: L(M) = ∅ if and only if there there is no path from theinitial state to a final state.

COMPSCI 220: Automata and Pattern Matching 23 / 123

Page 62: COMPSCI 220: Automata and Pattern Matching

DFA

Simple properties of DFAs 2

It is algorithmically decidable whether a DFA M accepts no string.Proof: Given the DFA M check whether there is a path from theinitial state s (has a trace of a computation) to a final state in F .We have: L(M) = ∅ if and only if there there is no path from theinitial state to a final state.

COMPSCI 220: Automata and Pattern Matching 23 / 123

Page 63: COMPSCI 220: Automata and Pattern Matching

DFA

Simple properties of DFAs 3

It is algorithmically decidable whether a DFA M accepts infinitelystrings.Proof: Given the DFA M, L(M) is infinite if and only if there is apath from the initial state (has a trace of a computation) s to a finalstate in F having the following additional property: some state q inthe path possesses a loop, i.e. there is a path from q to q.

COMPSCI 220: Automata and Pattern Matching 24 / 123

Page 64: COMPSCI 220: Automata and Pattern Matching

DFA

Simple properties of DFAs 3

It is algorithmically decidable whether a DFA M accepts infinitelystrings.Proof: Given the DFA M, L(M) is infinite if and only if there is apath from the initial state (has a trace of a computation) s to a finalstate in F having the following additional property: some state q inthe path possesses a loop, i.e. there is a path from q to q.

COMPSCI 220: Automata and Pattern Matching 24 / 123

Page 65: COMPSCI 220: Automata and Pattern Matching

NFA

The reverse operation

The reverse of a string

w = c1c2c3 · · · cn

is the stringR(w) = cncn−1 · · · c2c1.

For example, R(abaaa) = aaaba, R(abba) = abba, R(bac) = cab.

The reverse of a language A is the language

R(A) = {R(w) | w ∈ A}.

Problem: Is R(A) regular whenever A is regular?

COMPSCI 220: Automata and Pattern Matching 25 / 123

Page 66: COMPSCI 220: Automata and Pattern Matching

NFA

DFA: example 7 revisited

The language accepted bythe DFA M isA = {ambn | m, n > 0}.IsR(A) = {bnam | m, n > 0}regular?

q0 q1 q2

q3

a b

ab

a b

a, b

COMPSCI 220: Automata and Pattern Matching 26 / 123

Page 67: COMPSCI 220: Automata and Pattern Matching

NFA

A possible solution?

IsR(A) = {bman | m, n > 0}accepted by thismachine, M ′?

q0 q1 q2

q3

a ba

b

a b

a, b

COMPSCI 220: Automata and Pattern Matching 27 / 123

Page 68: COMPSCI 220: Automata and Pattern Matching

NFA

The solution ‘under microscope’: M vs M ′ 1

q0 q1 q2

q3

a b

ab

a b

a, b

q0 q1 q2

q3

a ba

b

a b

a, b

COMPSCI 220: Automata and Pattern Matching 28 / 123

Page 69: COMPSCI 220: Automata and Pattern Matching

NFA

The solution ‘under microscope’: M vs M ′ 2

What did we do, in more general terms?

1 The initial state of M becomes the accept state of M ′.2 Every accept state of M becomes an initial state of M ′.3 If δ(q1, c) = q2 is in M then δ(q2, c) = q1 is in M ′. That is, all

transitions are reversed.

COMPSCI 220: Automata and Pattern Matching 29 / 123

Page 70: COMPSCI 220: Automata and Pattern Matching

NFA

The solution ‘under microscope’: M vs M ′ 2

What did we do, in more general terms?

1 The initial state of M becomes the accept state of M ′.2 Every accept state of M becomes an initial state of M ′.3 If δ(q1, c) = q2 is in M then δ(q2, c) = q1 is in M ′. That is, all

transitions are reversed.

COMPSCI 220: Automata and Pattern Matching 29 / 123

Page 71: COMPSCI 220: Automata and Pattern Matching

NFA

The solution ‘under microscope’: M vs M ′ 2

What did we do, in more general terms?

1 The initial state of M becomes the accept state of M ′.2 Every accept state of M becomes an initial state of M ′.3 If δ(q1, c) = q2 is in M then δ(q2, c) = q1 is in M ′. That is, all

transitions are reversed.

COMPSCI 220: Automata and Pattern Matching 29 / 123

Page 72: COMPSCI 220: Automata and Pattern Matching

NFA

The solution ‘under microscope’: M vs. M ′ 3

Do we have a problem with M ′?Answer: yes: M ′ is not a DFA!

Still, the procedure seems reasonable!

What should we do? Well, let’s examine another example.

COMPSCI 220: Automata and Pattern Matching 30 / 123

Page 73: COMPSCI 220: Automata and Pattern Matching

NFA

The solution ‘under microscope’: M vs. M ′ 3

Do we have a problem with M ′?Answer: yes: M ′ is not a DFA!

Still, the procedure seems reasonable!

What should we do? Well, let’s examine another example.

COMPSCI 220: Automata and Pattern Matching 30 / 123

Page 74: COMPSCI 220: Automata and Pattern Matching

NFA

The solution ‘under microscope’: M vs. M ′ 3

Do we have a problem with M ′?Answer: yes: M ′ is not a DFA!

Still, the procedure seems reasonable!

What should we do? Well, let’s examine another example.

COMPSCI 220: Automata and Pattern Matching 30 / 123

Page 75: COMPSCI 220: Automata and Pattern Matching

NFA

The solution ‘under microscope’: M vs. M ′ 3

Do we have a problem with M ′?Answer: yes: M ′ is not a DFA!

Still, the procedure seems reasonable!

What should we do? Well, let’s examine another example.

COMPSCI 220: Automata and Pattern Matching 30 / 123

Page 76: COMPSCI 220: Automata and Pattern Matching

NFA

The solution ‘under microscope’ 4

Transforming this DFA Minto M ′ produces:a) two initial states: q2, q3

b) multiple transitionswith the same label (e.g.δ(q4, 0) = {q1, q2, q3, q4})

q0 q1 q2

q3 q4

0

1 0

1

0, 10, 1

0, 1

COMPSCI 220: Automata and Pattern Matching 31 / 123

Page 77: COMPSCI 220: Automata and Pattern Matching

NFA

Nondeterministic finite automata

Should we abandon the transformation M → M ′?

No. We turn it into a new concept!

A nondeterministic finite automaton (NFA, for short) is a five-tupleN = (Q,Σ, δ, S, F ) where

1 Q is the finite set of machine states2 Σ is the finite input alphabet3 δ is a function from Q × Σ to 2Q, the set of subsets of Q4 S ⊆ Q is a set of start (initial) states5 F ⊆ Q is the accepting (final/membership) states.

Informally, an NFA accepts a string w if there exists a(nondeterministic) trace (path) following the transition function δ oninput w from an initial state to an accept state.

COMPSCI 220: Automata and Pattern Matching 32 / 123

Page 78: COMPSCI 220: Automata and Pattern Matching

NFA

Nondeterministic finite automata

Should we abandon the transformation M → M ′?

No. We turn it into a new concept!

A nondeterministic finite automaton (NFA, for short) is a five-tupleN = (Q,Σ, δ, S, F ) where

1 Q is the finite set of machine states2 Σ is the finite input alphabet3 δ is a function from Q × Σ to 2Q, the set of subsets of Q4 S ⊆ Q is a set of start (initial) states5 F ⊆ Q is the accepting (final/membership) states.

Informally, an NFA accepts a string w if there exists a(nondeterministic) trace (path) following the transition function δ oninput w from an initial state to an accept state.

COMPSCI 220: Automata and Pattern Matching 32 / 123

Page 79: COMPSCI 220: Automata and Pattern Matching

NFA

Nondeterministic finite automata

Should we abandon the transformation M → M ′?

No. We turn it into a new concept!

A nondeterministic finite automaton (NFA, for short) is a five-tupleN = (Q,Σ, δ, S, F ) where

1 Q is the finite set of machine states2 Σ is the finite input alphabet3 δ is a function from Q × Σ to 2Q, the set of subsets of Q4 S ⊆ Q is a set of start (initial) states5 F ⊆ Q is the accepting (final/membership) states.

Informally, an NFA accepts a string w if there exists a(nondeterministic) trace (path) following the transition function δ oninput w from an initial state to an accept state.

COMPSCI 220: Automata and Pattern Matching 32 / 123

Page 80: COMPSCI 220: Automata and Pattern Matching

NFA

Nondeterministic finite automata

Should we abandon the transformation M → M ′?

No. We turn it into a new concept!

A nondeterministic finite automaton (NFA, for short) is a five-tupleN = (Q,Σ, δ, S, F ) where

1 Q is the finite set of machine states2 Σ is the finite input alphabet3 δ is a function from Q × Σ to 2Q, the set of subsets of Q4 S ⊆ Q is a set of start (initial) states5 F ⊆ Q is the accepting (final/membership) states.

Informally, an NFA accepts a string w if there exists a(nondeterministic) trace (path) following the transition function δ oninput w from an initial state to an accept state.

COMPSCI 220: Automata and Pattern Matching 32 / 123

Page 81: COMPSCI 220: Automata and Pattern Matching

NFA

Nondeterministic finite automata

Should we abandon the transformation M → M ′?

No. We turn it into a new concept!

A nondeterministic finite automaton (NFA, for short) is a five-tupleN = (Q,Σ, δ, S, F ) where

1 Q is the finite set of machine states2 Σ is the finite input alphabet3 δ is a function from Q × Σ to 2Q, the set of subsets of Q4 S ⊆ Q is a set of start (initial) states5 F ⊆ Q is the accepting (final/membership) states.

Informally, an NFA accepts a string w if there exists a(nondeterministic) trace (path) following the transition function δ oninput w from an initial state to an accept state.

COMPSCI 220: Automata and Pattern Matching 32 / 123

Page 82: COMPSCI 220: Automata and Pattern Matching

NFA

Nondeterministic finite automata

Should we abandon the transformation M → M ′?

No. We turn it into a new concept!

A nondeterministic finite automaton (NFA, for short) is a five-tupleN = (Q,Σ, δ, S, F ) where

1 Q is the finite set of machine states2 Σ is the finite input alphabet3 δ is a function from Q × Σ to 2Q, the set of subsets of Q4 S ⊆ Q is a set of start (initial) states5 F ⊆ Q is the accepting (final/membership) states.

Informally, an NFA accepts a string w if there exists a(nondeterministic) trace (path) following the transition function δ oninput w from an initial state to an accept state.

COMPSCI 220: Automata and Pattern Matching 32 / 123

Page 83: COMPSCI 220: Automata and Pattern Matching

NFA

Nondeterministic finite automata

Should we abandon the transformation M → M ′?

No. We turn it into a new concept!

A nondeterministic finite automaton (NFA, for short) is a five-tupleN = (Q,Σ, δ, S, F ) where

1 Q is the finite set of machine states2 Σ is the finite input alphabet3 δ is a function from Q × Σ to 2Q, the set of subsets of Q4 S ⊆ Q is a set of start (initial) states5 F ⊆ Q is the accepting (final/membership) states.

Informally, an NFA accepts a string w if there exists a(nondeterministic) trace (path) following the transition function δ oninput w from an initial state to an accept state.

COMPSCI 220: Automata and Pattern Matching 32 / 123

Page 84: COMPSCI 220: Automata and Pattern Matching

NFA

Nondeterministic finite automata

Should we abandon the transformation M → M ′?

No. We turn it into a new concept!

A nondeterministic finite automaton (NFA, for short) is a five-tupleN = (Q,Σ, δ, S, F ) where

1 Q is the finite set of machine states2 Σ is the finite input alphabet3 δ is a function from Q × Σ to 2Q, the set of subsets of Q4 S ⊆ Q is a set of start (initial) states5 F ⊆ Q is the accepting (final/membership) states.

Informally, an NFA accepts a string w if there exists a(nondeterministic) trace (path) following the transition function δ oninput w from an initial state to an accept state.

COMPSCI 220: Automata and Pattern Matching 32 / 123

Page 85: COMPSCI 220: Automata and Pattern Matching

NFA

NFA: accepted strings and language

Let N = (Q,Σ, δ, S, F ) be a NFA and w = w1w2 · · ·wn be a string overΣ.

A trace (path) of a computation of w on N is a sequence of states

s1, s2, · · · , sn, sn+1

such that

s2 ∈ δ(s1, w1), . . . , sn ∈ δ(sn−1, wn−1), sn+1 ∈ δ(sn, wn).

The string w is accepted (or recognised) by N if there is a traces1, s2, · · · , sn, sn+1 labelled by w such that s1 ∈ S and sn+1 ∈ F ;otherwise, w is rejected by N.

The language accepted by N, denoted by L(N), is the set of allaccepted strings by N.

COMPSCI 220: Automata and Pattern Matching 33 / 123

Page 86: COMPSCI 220: Automata and Pattern Matching

NFA

NFA: accepted strings and language

Let N = (Q,Σ, δ, S, F ) be a NFA and w = w1w2 · · ·wn be a string overΣ.

A trace (path) of a computation of w on N is a sequence of states

s1, s2, · · · , sn, sn+1

such that

s2 ∈ δ(s1, w1), . . . , sn ∈ δ(sn−1, wn−1), sn+1 ∈ δ(sn, wn).

The string w is accepted (or recognised) by N if there is a traces1, s2, · · · , sn, sn+1 labelled by w such that s1 ∈ S and sn+1 ∈ F ;otherwise, w is rejected by N.

The language accepted by N, denoted by L(N), is the set of allaccepted strings by N.

COMPSCI 220: Automata and Pattern Matching 33 / 123

Page 87: COMPSCI 220: Automata and Pattern Matching

NFA

NFA: accepted strings and language

Let N = (Q,Σ, δ, S, F ) be a NFA and w = w1w2 · · ·wn be a string overΣ.

A trace (path) of a computation of w on N is a sequence of states

s1, s2, · · · , sn, sn+1

such that

s2 ∈ δ(s1, w1), . . . , sn ∈ δ(sn−1, wn−1), sn+1 ∈ δ(sn, wn).

The string w is accepted (or recognised) by N if there is a traces1, s2, · · · , sn, sn+1 labelled by w such that s1 ∈ S and sn+1 ∈ F ;otherwise, w is rejected by N.

The language accepted by N, denoted by L(N), is the set of allaccepted strings by N.

COMPSCI 220: Automata and Pattern Matching 33 / 123

Page 88: COMPSCI 220: Automata and Pattern Matching

NFA

NFA: accepted strings and language

Let N = (Q,Σ, δ, S, F ) be a NFA and w = w1w2 · · ·wn be a string overΣ.

A trace (path) of a computation of w on N is a sequence of states

s1, s2, · · · , sn, sn+1

such that

s2 ∈ δ(s1, w1), . . . , sn ∈ δ(sn−1, wn−1), sn+1 ∈ δ(sn, wn).

The string w is accepted (or recognised) by N if there is a traces1, s2, · · · , sn, sn+1 labelled by w such that s1 ∈ S and sn+1 ∈ F ;otherwise, w is rejected by N.

The language accepted by N, denoted by L(N), is the set of allaccepted strings by N.

COMPSCI 220: Automata and Pattern Matching 33 / 123

Page 89: COMPSCI 220: Automata and Pattern Matching

NFA

NFA: accepted strings and language

Let N = (Q,Σ, δ, S, F ) be a NFA and w = w1w2 · · ·wn be a string overΣ.

A trace (path) of a computation of w on N is a sequence of states

s1, s2, · · · , sn, sn+1

such that

s2 ∈ δ(s1, w1), . . . , sn ∈ δ(sn−1, wn−1), sn+1 ∈ δ(sn, wn).

The string w is accepted (or recognised) by N if there is a traces1, s2, · · · , sn, sn+1 labelled by w such that s1 ∈ S and sn+1 ∈ F ;otherwise, w is rejected by N.

The language accepted by N, denoted by L(N), is the set of allaccepted strings by N.

COMPSCI 220: Automata and Pattern Matching 33 / 123

Page 90: COMPSCI 220: Automata and Pattern Matching

NFA

NFA: comments

The state transition function δ is more general for NFAs thanDFAs. Besides having transitions to multiple states for a giveninput symbol, we can have δ(q, c) empty (undefined) for someq ∈ Q and c ∈ Σ. This means that that we can design automatasuch that no state moves are possible for when in some state qand the next character read is c (that is, the human designer doesnot have to worry about all cases).

Every DFA can be viewed as a special case of an NFA.

COMPSCI 220: Automata and Pattern Matching 34 / 123

Page 91: COMPSCI 220: Automata and Pattern Matching

NFA

NFA: comments

The state transition function δ is more general for NFAs thanDFAs. Besides having transitions to multiple states for a giveninput symbol, we can have δ(q, c) empty (undefined) for someq ∈ Q and c ∈ Σ. This means that that we can design automatasuch that no state moves are possible for when in some state qand the next character read is c (that is, the human designer doesnot have to worry about all cases).

Every DFA can be viewed as a special case of an NFA.

COMPSCI 220: Automata and Pattern Matching 34 / 123

Page 92: COMPSCI 220: Automata and Pattern Matching

NFA

NFA: comments

The state transition function δ is more general for NFAs thanDFAs. Besides having transitions to multiple states for a giveninput symbol, we can have δ(q, c) empty (undefined) for someq ∈ Q and c ∈ Σ. This means that that we can design automatasuch that no state moves are possible for when in some state qand the next character read is c (that is, the human designer doesnot have to worry about all cases).

Every DFA can be viewed as a special case of an NFA.

COMPSCI 220: Automata and Pattern Matching 34 / 123

Page 93: COMPSCI 220: Automata and Pattern Matching

NFA

NFA: example 1 1

Σ = {a, b}

δ ΣStates a b

q0 {q0} {q0, q1}q1 {q2} {q2}q2 ∅ ∅

S = {q0}F = {q2}

q0

a, b

q1

q2

a, b

b

COMPSCI 220: Automata and Pattern Matching 35 / 123

Page 94: COMPSCI 220: Automata and Pattern Matching

NFA

NFA: example 1 2

The string aba is accepted: there are two traces,

q0a→q0

b→q0

a→q0,

q0a→q0

b→q1

a→q2

The string baa is not accepted: there are two traces,

q0b→q0

a→q0

a→q0,

q0b→q1

a→q2

a→?

The language accepted by this NFA is

{uba, ubb | u ∈ {a, b}∗}.

COMPSCI 220: Automata and Pattern Matching 36 / 123

Page 95: COMPSCI 220: Automata and Pattern Matching

NFA

NFA: example 1 2

The string aba is accepted: there are two traces,

q0a→q0

b→q0

a→q0,

q0a→q0

b→q1

a→q2

The string baa is not accepted: there are two traces,

q0b→q0

a→q0

a→q0,

q0b→q1

a→q2

a→?

The language accepted by this NFA is

{uba, ubb | u ∈ {a, b}∗}.

COMPSCI 220: Automata and Pattern Matching 36 / 123

Page 96: COMPSCI 220: Automata and Pattern Matching

NFA

NFA: example 1 2

The string aba is accepted: there are two traces,

q0a→q0

b→q0

a→q0,

q0a→q0

b→q1

a→q2

The string baa is not accepted: there are two traces,

q0b→q0

a→q0

a→q0,

q0b→q1

a→q2

a→?

The language accepted by this NFA is

{uba, ubb | u ∈ {a, b}∗}.

COMPSCI 220: Automata and Pattern Matching 36 / 123

Page 97: COMPSCI 220: Automata and Pattern Matching

NFA

NFA: example 2

Σ = {1, 2, 3}

δ ΣStates 1 2 3

q0 {q0, q1} {q0, q2} {q3}q1 {q1, q3} ∅ {q1}q2 {q2} {q2} {q2}q3 ∅ ∅ {q2}

S = {q0}F = {q2}

q0

1, 2

q1

1, 3

q2

q3

1, 2, 3

1

2

1

3

3

COMPSCI 220: Automata and Pattern Matching 37 / 123

Page 98: COMPSCI 220: Automata and Pattern Matching

NFA

NFA=DFA 1

Every NFA can be simulated by a DFA.

In fact, there is an algorithm which converts an NFA N into anequivalent DFA M, that is L(M) = L(N).

Idea: Create potentially a state in M for every subset of states of N. Inthe worst case, if N has n states, then M has 2n states.

Comment: Many of these states are not reachable so the algorithmoften terminates with a smaller DFA than the worst case.

Algorithm: NFAtoDFA is a method for constructing a DFA equivalentwith a given NFA.

Theorem: A language is regular if and only if it is recognised by anNFA.

COMPSCI 220: Automata and Pattern Matching 38 / 123

Page 99: COMPSCI 220: Automata and Pattern Matching

NFA

NFA=DFA 1

Every NFA can be simulated by a DFA.

In fact, there is an algorithm which converts an NFA N into anequivalent DFA M, that is L(M) = L(N).

Idea: Create potentially a state in M for every subset of states of N. Inthe worst case, if N has n states, then M has 2n states.

Comment: Many of these states are not reachable so the algorithmoften terminates with a smaller DFA than the worst case.

Algorithm: NFAtoDFA is a method for constructing a DFA equivalentwith a given NFA.

Theorem: A language is regular if and only if it is recognised by anNFA.

COMPSCI 220: Automata and Pattern Matching 38 / 123

Page 100: COMPSCI 220: Automata and Pattern Matching

NFA

NFA=DFA 1

Every NFA can be simulated by a DFA.

In fact, there is an algorithm which converts an NFA N into anequivalent DFA M, that is L(M) = L(N).

Idea: Create potentially a state in M for every subset of states of N. Inthe worst case, if N has n states, then M has 2n states.

Comment: Many of these states are not reachable so the algorithmoften terminates with a smaller DFA than the worst case.

Algorithm: NFAtoDFA is a method for constructing a DFA equivalentwith a given NFA.

Theorem: A language is regular if and only if it is recognised by anNFA.

COMPSCI 220: Automata and Pattern Matching 38 / 123

Page 101: COMPSCI 220: Automata and Pattern Matching

NFA

NFA=DFA 1

Every NFA can be simulated by a DFA.

In fact, there is an algorithm which converts an NFA N into anequivalent DFA M, that is L(M) = L(N).

Idea: Create potentially a state in M for every subset of states of N. Inthe worst case, if N has n states, then M has 2n states.

Comment: Many of these states are not reachable so the algorithmoften terminates with a smaller DFA than the worst case.

Algorithm: NFAtoDFA is a method for constructing a DFA equivalentwith a given NFA.

Theorem: A language is regular if and only if it is recognised by anNFA.

COMPSCI 220: Automata and Pattern Matching 38 / 123

Page 102: COMPSCI 220: Automata and Pattern Matching

NFA

NFA=DFA 1

Every NFA can be simulated by a DFA.

In fact, there is an algorithm which converts an NFA N into anequivalent DFA M, that is L(M) = L(N).

Idea: Create potentially a state in M for every subset of states of N. Inthe worst case, if N has n states, then M has 2n states.

Comment: Many of these states are not reachable so the algorithmoften terminates with a smaller DFA than the worst case.

Algorithm: NFAtoDFA is a method for constructing a DFA equivalentwith a given NFA.

Theorem: A language is regular if and only if it is recognised by anNFA.

COMPSCI 220: Automata and Pattern Matching 38 / 123

Page 103: COMPSCI 220: Automata and Pattern Matching

NFA

NFA=DFA 1

Every NFA can be simulated by a DFA.

In fact, there is an algorithm which converts an NFA N into anequivalent DFA M, that is L(M) = L(N).

Idea: Create potentially a state in M for every subset of states of N. Inthe worst case, if N has n states, then M has 2n states.

Comment: Many of these states are not reachable so the algorithmoften terminates with a smaller DFA than the worst case.

Algorithm: NFAtoDFA is a method for constructing a DFA equivalentwith a given NFA.

Theorem: A language is regular if and only if it is recognised by anNFA.

COMPSCI 220: Automata and Pattern Matching 38 / 123

Page 104: COMPSCI 220: Automata and Pattern Matching

NFA

NFA=DFA 2

Input: NFA N = (Q,Σ, δ, S, F )Output: DFA M = (QM ,Σ, δM , sM , FM)

The set of states of M is the set of all subsets of Q, QM = 2Q.

The transition from a set of states A on an element x ∈ Σ is theset of all states produces by N on each pair (q, x) with q ∈ A,δM(A, x) = {δ(q, x) | q ∈ A}.

The initial state sM of M is the set of all initial states of N, sM = S.

The accepting states FM of M is the set of states that have anaccepting state of N, FM = {A ⊆ Q | A ∩ F 6= ∅}.

Note: the algorithm NFAtoDFA follows the above construction, buteliminates all non-reachable states.

COMPSCI 220: Automata and Pattern Matching 39 / 123

Page 105: COMPSCI 220: Automata and Pattern Matching

NFA

NFA=DFA 2

Input: NFA N = (Q,Σ, δ, S, F )Output: DFA M = (QM ,Σ, δM , sM , FM)

The set of states of M is the set of all subsets of Q, QM = 2Q.

The transition from a set of states A on an element x ∈ Σ is theset of all states produces by N on each pair (q, x) with q ∈ A,δM(A, x) = {δ(q, x) | q ∈ A}.

The initial state sM of M is the set of all initial states of N, sM = S.

The accepting states FM of M is the set of states that have anaccepting state of N, FM = {A ⊆ Q | A ∩ F 6= ∅}.

Note: the algorithm NFAtoDFA follows the above construction, buteliminates all non-reachable states.

COMPSCI 220: Automata and Pattern Matching 39 / 123

Page 106: COMPSCI 220: Automata and Pattern Matching

NFA

NFA=DFA 2

Input: NFA N = (Q,Σ, δ, S, F )Output: DFA M = (QM ,Σ, δM , sM , FM)

The set of states of M is the set of all subsets of Q, QM = 2Q.

The transition from a set of states A on an element x ∈ Σ is theset of all states produces by N on each pair (q, x) with q ∈ A,δM(A, x) = {δ(q, x) | q ∈ A}.

The initial state sM of M is the set of all initial states of N, sM = S.

The accepting states FM of M is the set of states that have anaccepting state of N, FM = {A ⊆ Q | A ∩ F 6= ∅}.

Note: the algorithm NFAtoDFA follows the above construction, buteliminates all non-reachable states.

COMPSCI 220: Automata and Pattern Matching 39 / 123

Page 107: COMPSCI 220: Automata and Pattern Matching

NFA

NFA=DFA 2

Input: NFA N = (Q,Σ, δ, S, F )Output: DFA M = (QM ,Σ, δM , sM , FM)

The set of states of M is the set of all subsets of Q, QM = 2Q.

The transition from a set of states A on an element x ∈ Σ is theset of all states produces by N on each pair (q, x) with q ∈ A,δM(A, x) = {δ(q, x) | q ∈ A}.

The initial state sM of M is the set of all initial states of N, sM = S.

The accepting states FM of M is the set of states that have anaccepting state of N, FM = {A ⊆ Q | A ∩ F 6= ∅}.

Note: the algorithm NFAtoDFA follows the above construction, buteliminates all non-reachable states.

COMPSCI 220: Automata and Pattern Matching 39 / 123

Page 108: COMPSCI 220: Automata and Pattern Matching

NFA

NFA=DFA 2

Input: NFA N = (Q,Σ, δ, S, F )Output: DFA M = (QM ,Σ, δM , sM , FM)

The set of states of M is the set of all subsets of Q, QM = 2Q.

The transition from a set of states A on an element x ∈ Σ is theset of all states produces by N on each pair (q, x) with q ∈ A,δM(A, x) = {δ(q, x) | q ∈ A}.

The initial state sM of M is the set of all initial states of N, sM = S.

The accepting states FM of M is the set of states that have anaccepting state of N, FM = {A ⊆ Q | A ∩ F 6= ∅}.

Note: the algorithm NFAtoDFA follows the above construction, buteliminates all non-reachable states.

COMPSCI 220: Automata and Pattern Matching 39 / 123

Page 109: COMPSCI 220: Automata and Pattern Matching

NFA

NFAtoDFA: an example 1

q0

q1

q2

2

1

1, 2

11

The NFA N

COMPSCI 220: Automata and Pattern Matching 40 / 123

Page 110: COMPSCI 220: Automata and Pattern Matching

NFA

NFAtoDFA: an example 2

q′

0 = {q0} q′

1 = {q1, q2} q′

2 = {q0, q2}

q′

3 = {q2} q′

4 = {q0, q1, q2}

2 1

1

2

1

2

1

2 21

Equivalent DFA M

COMPSCI 220: Automata and Pattern Matching 41 / 123

Page 111: COMPSCI 220: Automata and Pattern Matching

NFA

Closure properties of regular languages 1

The union of two regular languages is also regular.Proof: Given two NFAs NA, NB with no common states such thatA = L(NA), B = L(NB), the NFA N consisting of the union of allcomponents of NA, NB recognises A ∪ B.

More precisely, if NA = (QA,Σ, δA, SA, FA) andNB = (QB,Σ, δB , SB, FB) with QA ∩ QB = ∅, then A ∪ B isrecognised by the NFA

N = (QA ∪ QB,Σ, δA ∪ δB , SA ∪ SB, FA ∪ FB).

The intersection of two regular languages is also regular.Proof: A ∩ B = A ∪ B.

COMPSCI 220: Automata and Pattern Matching 42 / 123

Page 112: COMPSCI 220: Automata and Pattern Matching

NFA

Closure properties of regular languages 1

The union of two regular languages is also regular.Proof: Given two NFAs NA, NB with no common states such thatA = L(NA), B = L(NB), the NFA N consisting of the union of allcomponents of NA, NB recognises A ∪ B.

More precisely, if NA = (QA,Σ, δA, SA, FA) andNB = (QB,Σ, δB , SB, FB) with QA ∩ QB = ∅, then A ∪ B isrecognised by the NFA

N = (QA ∪ QB,Σ, δA ∪ δB , SA ∪ SB, FA ∪ FB).

The intersection of two regular languages is also regular.Proof: A ∩ B = A ∪ B.

COMPSCI 220: Automata and Pattern Matching 42 / 123

Page 113: COMPSCI 220: Automata and Pattern Matching

NFA

Closure properties of regular languages 1

The union of two regular languages is also regular.Proof: Given two NFAs NA, NB with no common states such thatA = L(NA), B = L(NB), the NFA N consisting of the union of allcomponents of NA, NB recognises A ∪ B.

More precisely, if NA = (QA,Σ, δA, SA, FA) andNB = (QB,Σ, δB , SB, FB) with QA ∩ QB = ∅, then A ∪ B isrecognised by the NFA

N = (QA ∪ QB,Σ, δA ∪ δB , SA ∪ SB, FA ∪ FB).

The intersection of two regular languages is also regular.Proof: A ∩ B = A ∪ B.

COMPSCI 220: Automata and Pattern Matching 42 / 123

Page 114: COMPSCI 220: Automata and Pattern Matching

NFA

Closure properties of regular languages 1

The union of two regular languages is also regular.Proof: Given two NFAs NA, NB with no common states such thatA = L(NA), B = L(NB), the NFA N consisting of the union of allcomponents of NA, NB recognises A ∪ B.

More precisely, if NA = (QA,Σ, δA, SA, FA) andNB = (QB,Σ, δB , SB, FB) with QA ∩ QB = ∅, then A ∪ B isrecognised by the NFA

N = (QA ∪ QB,Σ, δA ∪ δB , SA ∪ SB, FA ∪ FB).

The intersection of two regular languages is also regular.Proof: A ∩ B = A ∪ B.

COMPSCI 220: Automata and Pattern Matching 42 / 123

Page 115: COMPSCI 220: Automata and Pattern Matching

NFA

Closure under union: an example 1

q2 q0 q1

a, b

a

COMPSCI 220: Automata and Pattern Matching 43 / 123

Page 116: COMPSCI 220: Automata and Pattern Matching

NFA

Closure under union: an example 2

q2

q0 q1

a, b

a

COMPSCI 220: Automata and Pattern Matching 44 / 123

Page 117: COMPSCI 220: Automata and Pattern Matching

NFA

Closure under intersection: an example 1

q0 q1

a

b

b

NFA N1

q3

b

NFA N2

COMPSCI 220: Automata and Pattern Matching 45 / 123

Page 118: COMPSCI 220: Automata and Pattern Matching

NFA

Closure under intersection: an example 2

q0 q1

a

b

b

NFA accepting the complement of N1?

COMPSCI 220: Automata and Pattern Matching 46 / 123

Page 119: COMPSCI 220: Automata and Pattern Matching

NFA

Closure under intersection: an example 3

q0 q1 q3

a

b a

b a, b

DFA M1 equivalent to N1

COMPSCI 220: Automata and Pattern Matching 47 / 123

Page 120: COMPSCI 220: Automata and Pattern Matching

NFA

Closure under intersection: an example 4

q0 q1 q3

a

b a

b a, b

DFA M1 recognising the complement of M1

COMPSCI 220: Automata and Pattern Matching 48 / 123

Page 121: COMPSCI 220: Automata and Pattern Matching

NFA

Closure under intersection: an example 5

q3 q4

b

a

a, b

DFA M2 equivalent to N2

COMPSCI 220: Automata and Pattern Matching 49 / 123

Page 122: COMPSCI 220: Automata and Pattern Matching

NFA

Closure under intersection: an example 6

q3 q4

b

a

a, b

DFA M2 recognising the complement of M2

COMPSCI 220: Automata and Pattern Matching 50 / 123

Page 123: COMPSCI 220: Automata and Pattern Matching

NFA

Closure under intersection: an example 7

q0 q1 q2

q3 q4

a

b a

b a, b

b

a

a, b

NFA N3 recognising L(M2) ∪ L(M2)

COMPSCI 220: Automata and Pattern Matching 51 / 123

Page 124: COMPSCI 220: Automata and Pattern Matching

NFA

Closure under intersection: an example 8

Last two steps:

Construct a DFA M3 equivalent to the NFA N3

Construct the complement ofL(M3) = L(N1) ∩ L(N2) = {bk | k ≥ 1}

Recap:

L(N1) = {anbm | n ≥ 0, m ≥ 1}

L(N2) = {bm | m ≥ 0}

L(M3) = {bk | k ≥ 1}

COMPSCI 220: Automata and Pattern Matching 52 / 123

Page 125: COMPSCI 220: Automata and Pattern Matching

NFA

Closure under intersection: an example 8

Last two steps:

Construct a DFA M3 equivalent to the NFA N3

Construct the complement ofL(M3) = L(N1) ∩ L(N2) = {bk | k ≥ 1}

Recap:

L(N1) = {anbm | n ≥ 0, m ≥ 1}

L(N2) = {bm | m ≥ 0}

L(M3) = {bk | k ≥ 1}

COMPSCI 220: Automata and Pattern Matching 52 / 123

Page 126: COMPSCI 220: Automata and Pattern Matching

NFA

Closure properties of regular languages 2

The closure (or Kleene star) of a language A, denoted by A∗, is the setof all strings that can be formed by concatenating together any finitenumber of strings of A.

Examples:

{a}∗ = {ε, a, aa, aaa, . . . , an, . . .}

{a, ab}∗ = {ε, a, ab, aa, abab, aab, aba, . . .}

The Kleene star of a regular language is also regular.Proof: Given an NFA NA that recognizes a language A we canbuild an NFA NA∗ that recognises the closure of A by making astart state accept state and, adding transitions, withcorresponding labels, from all accept state(s) to the neighbours ofthe initial state(s).

COMPSCI 220: Automata and Pattern Matching 53 / 123

Page 127: COMPSCI 220: Automata and Pattern Matching

NFA

Closure properties of regular languages 2

The closure (or Kleene star) of a language A, denoted by A∗, is the setof all strings that can be formed by concatenating together any finitenumber of strings of A.

Examples:

{a}∗ = {ε, a, aa, aaa, . . . , an, . . .}

{a, ab}∗ = {ε, a, ab, aa, abab, aab, aba, . . .}

The Kleene star of a regular language is also regular.Proof: Given an NFA NA that recognizes a language A we canbuild an NFA NA∗ that recognises the closure of A by making astart state accept state and, adding transitions, withcorresponding labels, from all accept state(s) to the neighbours ofthe initial state(s).

COMPSCI 220: Automata and Pattern Matching 53 / 123

Page 128: COMPSCI 220: Automata and Pattern Matching

NFA

Closure operation: an example

q0

q2

q1

q3

0, 1

0

1

0

1

0, 1

q0

q2

q1

q3

0, 1

0

1

0

1

0, 1

1

COMPSCI 220: Automata and Pattern Matching 54 / 123

Page 129: COMPSCI 220: Automata and Pattern Matching

NFA

Closure properties of regular languages 3

The concatenation of two languages A, B is defined to be the set ofstrings that can be formed by concatenating all strings of A with allstrings of B, i.e.

AB = {xy | x ∈ A, y ∈ B}.

Example: If A = {an | n ≥ 0} and B = {bw | w ∈ {a, b}∗}, then

AB = {anbw | w ∈ {a, b}∗, n ≥ 0} = {ubv | u, v ∈ {a, b}∗}.

COMPSCI 220: Automata and Pattern Matching 55 / 123

Page 130: COMPSCI 220: Automata and Pattern Matching

NFA

Closure under concatenation: an example 1.1

q0

a

q1 q2

a, b

b

COMPSCI 220: Automata and Pattern Matching 56 / 123

Page 131: COMPSCI 220: Automata and Pattern Matching

NFA

Closure under concatenation: an example 1.2

q0 q1 q2

a a, b

b?

COMPSCI 220: Automata and Pattern Matching 57 / 123

Page 132: COMPSCI 220: Automata and Pattern Matching

NFA

Closure under concatenation: an example 1.3

q0 q1 q2

a a, b

b

b

COMPSCI 220: Automata and Pattern Matching 58 / 123

Page 133: COMPSCI 220: Automata and Pattern Matching

NFA

Closure under concatenation: an example 1.3’

q0 = q1 q2

a a, b

b

COMPSCI 220: Automata and Pattern Matching 59 / 123

Page 134: COMPSCI 220: Automata and Pattern Matching

NFA

Closure under concatenation: an example 2.1

q0 q1

bq2 q3

a, b

a

COMPSCI 220: Automata and Pattern Matching 60 / 123

Page 135: COMPSCI 220: Automata and Pattern Matching

NFA

Closure under concatenation: an example 2.2

q0 q1 q2 q3b ?

?a

a, b

COMPSCI 220: Automata and Pattern Matching 61 / 123

Page 136: COMPSCI 220: Automata and Pattern Matching

NFA

Closure under concatenation: an example 2.3

q0

q1 = q2

q3

a, bb

?

a

COMPSCI 220: Automata and Pattern Matching 62 / 123

Page 137: COMPSCI 220: Automata and Pattern Matching

NFA

Closure under concatenation: an example 2.4

q0

q1 = q2

q3

a, bb a

COMPSCI 220: Automata and Pattern Matching 63 / 123

Page 138: COMPSCI 220: Automata and Pattern Matching

NFA

Closure under concatenation: an example 2.4’

q0

q1 = q2

q3

a, bb a

a

COMPSCI 220: Automata and Pattern Matching 64 / 123

Page 139: COMPSCI 220: Automata and Pattern Matching

NFA

Closure properties of regular languages 3

The concatenation of two regular languages is also regular.Proof: Given two NFAs NA = (QA,Σ, δA, SA, FA) andNB = (QB,Σ, δB , SB, FB), QA ∩ QB = ∅, recognising the languagesA, B, respectively, we can build an NFA N = (Q,Σ, δ, S, F ) thatrecognises the concatenation of A and B as follows:

◮ Q = QA ∪ QB◮ S = SA ∪ SB if one state of SA is a final state; otherwise, S = SA◮ F = FB◮

δ(q, c) =

δA(q, c), if q ∈ QA \ FA,

δB(q, c), if q ∈ QB \ SB ,

δA(q, c) ∪ {δB(q′, c) | q′ ∈ SB}, if q ∈ FA.

COMPSCI 220: Automata and Pattern Matching 65 / 123

Page 140: COMPSCI 220: Automata and Pattern Matching

NFA

Closure properties of regular languages 3

The concatenation of two regular languages is also regular.Proof: Given two NFAs NA = (QA,Σ, δA, SA, FA) andNB = (QB,Σ, δB , SB, FB), QA ∩ QB = ∅, recognising the languagesA, B, respectively, we can build an NFA N = (Q,Σ, δ, S, F ) thatrecognises the concatenation of A and B as follows:

◮ Q = QA ∪ QB◮ S = SA ∪ SB if one state of SA is a final state; otherwise, S = SA◮ F = FB◮

δ(q, c) =

δA(q, c), if q ∈ QA \ FA,

δB(q, c), if q ∈ QB \ SB ,

δA(q, c) ∪ {δB(q′, c) | q′ ∈ SB}, if q ∈ FA.

COMPSCI 220: Automata and Pattern Matching 65 / 123

Page 141: COMPSCI 220: Automata and Pattern Matching

NFA

Closure properties of regular languages 3

The concatenation of two regular languages is also regular.Proof: Given two NFAs NA = (QA,Σ, δA, SA, FA) andNB = (QB,Σ, δB , SB, FB), QA ∩ QB = ∅, recognising the languagesA, B, respectively, we can build an NFA N = (Q,Σ, δ, S, F ) thatrecognises the concatenation of A and B as follows:

◮ Q = QA ∪ QB◮ S = SA ∪ SB if one state of SA is a final state; otherwise, S = SA◮ F = FB◮

δ(q, c) =

δA(q, c), if q ∈ QA \ FA,

δB(q, c), if q ∈ QB \ SB ,

δA(q, c) ∪ {δB(q′, c) | q′ ∈ SB}, if q ∈ FA.

COMPSCI 220: Automata and Pattern Matching 65 / 123

Page 142: COMPSCI 220: Automata and Pattern Matching

NFA

Closure properties of regular languages 3

The concatenation of two regular languages is also regular.Proof: Given two NFAs NA = (QA,Σ, δA, SA, FA) andNB = (QB,Σ, δB , SB, FB), QA ∩ QB = ∅, recognising the languagesA, B, respectively, we can build an NFA N = (Q,Σ, δ, S, F ) thatrecognises the concatenation of A and B as follows:

◮ Q = QA ∪ QB◮ S = SA ∪ SB if one state of SA is a final state; otherwise, S = SA◮ F = FB◮

δ(q, c) =

δA(q, c), if q ∈ QA \ FA,

δB(q, c), if q ∈ QB \ SB ,

δA(q, c) ∪ {δB(q′, c) | q′ ∈ SB}, if q ∈ FA.

COMPSCI 220: Automata and Pattern Matching 65 / 123

Page 143: COMPSCI 220: Automata and Pattern Matching

NFA

Closure properties of regular languages 3

The concatenation of two regular languages is also regular.Proof: Given two NFAs NA = (QA,Σ, δA, SA, FA) andNB = (QB,Σ, δB , SB, FB), QA ∩ QB = ∅, recognising the languagesA, B, respectively, we can build an NFA N = (Q,Σ, δ, S, F ) thatrecognises the concatenation of A and B as follows:

◮ Q = QA ∪ QB◮ S = SA ∪ SB if one state of SA is a final state; otherwise, S = SA◮ F = FB◮

δ(q, c) =

δA(q, c), if q ∈ QA \ FA,

δB(q, c), if q ∈ QB \ SB ,

δA(q, c) ∪ {δB(q′, c) | q′ ∈ SB}, if q ∈ FA.

COMPSCI 220: Automata and Pattern Matching 65 / 123

Page 144: COMPSCI 220: Automata and Pattern Matching

NFA

Closure under repeated concatenation

Let A be a language and n ≥ 1. We define:

An = {x1x2 · · · xn | x1, x2, . . . , xn ∈ A}.

If A is a regular language, then for each n ≥ 1, An is also regular.Proof: A1 = A, A2 = AA, . . . , An = AA · · ·A

︸ ︷︷ ︸

n times

, so the result follows

from the closure under concatenation.

COMPSCI 220: Automata and Pattern Matching 66 / 123

Page 145: COMPSCI 220: Automata and Pattern Matching

NFA

Closure under repeated concatenation

Let A be a language and n ≥ 1. We define:

An = {x1x2 · · · xn | x1, x2, . . . , xn ∈ A}.

If A is a regular language, then for each n ≥ 1, An is also regular.Proof: A1 = A, A2 = AA, . . . , An = AA · · ·A

︸ ︷︷ ︸

n times

, so the result follows

from the closure under concatenation.

COMPSCI 220: Automata and Pattern Matching 66 / 123

Page 146: COMPSCI 220: Automata and Pattern Matching

NFA

More decidable properties of regular languages 1

It is algorithmically decidable whether two DFAs accept the samelanguage.Proof: If A, B are two languages recognised by the DFAs MA, MB,respectively, then (using the closure properties of regularlanguages) we can construct a DFA M such that:

L(M) = A ∆ B = (A ∩ B) ∪ (B ∩ A),

and then use the equivalence:

A = B ⇔ A ∆ B = ∅.

COMPSCI 220: Automata and Pattern Matching 67 / 123

Page 147: COMPSCI 220: Automata and Pattern Matching

NFA

More decidable properties of regular languages 1

It is algorithmically decidable whether two DFAs accept the samelanguage.Proof: If A, B are two languages recognised by the DFAs MA, MB,respectively, then (using the closure properties of regularlanguages) we can construct a DFA M such that:

L(M) = A ∆ B = (A ∩ B) ∪ (B ∩ A),

and then use the equivalence:

A = B ⇔ A ∆ B = ∅.

COMPSCI 220: Automata and Pattern Matching 67 / 123

Page 148: COMPSCI 220: Automata and Pattern Matching

NFA

More decidable properties of regular languages 1

It is algorithmically decidable whether two DFAs accept the samelanguage.Proof: If A, B are two languages recognised by the DFAs MA, MB,respectively, then (using the closure properties of regularlanguages) we can construct a DFA M such that:

L(M) = A ∆ B = (A ∩ B) ∪ (B ∩ A),

and then use the equivalence:

A = B ⇔ A ∆ B = ∅.

COMPSCI 220: Automata and Pattern Matching 67 / 123

Page 149: COMPSCI 220: Automata and Pattern Matching

NFA

More decidable properties of regular languages: an example 1.1

q0 q1

a

b

a, b

DFA M1

{anbu | n ≥ 0, u ∈ {a, b}∗}

q3

a, b

DFA M2

{a, b}∗

COMPSCI 220: Automata and Pattern Matching 68 / 123

Page 150: COMPSCI 220: Automata and Pattern Matching

NFA

More decidable properties of regular languages: an example 1.2

q0 q1

a

b

a, b

DFA M1

{an | n ≥ 0}

q3

a, b

DFA M2

COMPSCI 220: Automata and Pattern Matching 69 / 123

Page 151: COMPSCI 220: Automata and Pattern Matching

NFA

More decidable properties of regular languages: an example 1.3

{q0, q3} {q1, q3}

a

b

a, b

DFA M1 ∩ M2

{q0, q3} {q1, q3}

a

b

a, b

DFA M1 ∩ M2

{an | n ≥ 0}

COMPSCI 220: Automata and Pattern Matching 70 / 123

Page 152: COMPSCI 220: Automata and Pattern Matching

NFA

More decidable properties of regular languages: an example 1.4

{(q0, q3), (q0, q3)} {(q0, q3), (q1, q3)}

{(q1, q3), (q0, q3)} {(q1, q3), (q1, q3)}

a a

a a, b

b

b

b

DFA M1∆M2: {an | n ≥ 0} 6= ∅ implies L(M1) 6= L(M2)

COMPSCI 220: Automata and Pattern Matching 71 / 123

Page 153: COMPSCI 220: Automata and Pattern Matching

NFA

More decidable properties of regular languages 2

It is algorithmically decidable whether a DFA M accepts only onea string w .Proof: Take A = L(M) and B = {w}.

It is algorithmically decidable whether the language accepted by aDFA M includes the language accepted by a DFA M ′.Proof: We use the equivalence

L(M) ⊆ L(M ′) ⇔ L(M) ∩ L(M ′) = L(M).

COMPSCI 220: Automata and Pattern Matching 72 / 123

Page 154: COMPSCI 220: Automata and Pattern Matching

NFA

More decidable properties of regular languages 2

It is algorithmically decidable whether a DFA M accepts only onea string w .Proof: Take A = L(M) and B = {w}.

It is algorithmically decidable whether the language accepted by aDFA M includes the language accepted by a DFA M ′.Proof: We use the equivalence

L(M) ⊆ L(M ′) ⇔ L(M) ∩ L(M ′) = L(M).

COMPSCI 220: Automata and Pattern Matching 72 / 123

Page 155: COMPSCI 220: Automata and Pattern Matching

NFA

More decidable properties of regular languages 2

It is algorithmically decidable whether a DFA M accepts only onea string w .Proof: Take A = L(M) and B = {w}.

It is algorithmically decidable whether the language accepted by aDFA M includes the language accepted by a DFA M ′.Proof: We use the equivalence

L(M) ⊆ L(M ′) ⇔ L(M) ∩ L(M ′) = L(M).

COMPSCI 220: Automata and Pattern Matching 72 / 123

Page 156: COMPSCI 220: Automata and Pattern Matching

Minimisation of DFAs

Minimisation of DFAs 1

We want to minimise the number of states of a DFA, i.e. given a DFAM produce a new DFA M ′ such that:

L(M) = L(M ′),

M ′ has less states than M.

COMPSCI 220: Automata and Pattern Matching 73 / 123

Page 157: COMPSCI 220: Automata and Pattern Matching

Minimisation of DFAs

Minimisation of DFAs 1

We want to minimise the number of states of a DFA, i.e. given a DFAM produce a new DFA M ′ such that:

L(M) = L(M ′),

M ′ has less states than M.

COMPSCI 220: Automata and Pattern Matching 73 / 123

Page 158: COMPSCI 220: Automata and Pattern Matching

Minimisation of DFAs

Minimisation of DFAs 1

We want to minimise the number of states of a DFA, i.e. given a DFAM produce a new DFA M ′ such that:

L(M) = L(M ′),

M ′ has less states than M.

COMPSCI 220: Automata and Pattern Matching 73 / 123

Page 159: COMPSCI 220: Automata and Pattern Matching

Minimisation of DFAs

Minimisation of DFAs 2

q0

q1

q2

q3

a, b

a, b

a, ba

b

The state q3 can be removed without modifying the accepted language

COMPSCI 220: Automata and Pattern Matching 74 / 123

Page 160: COMPSCI 220: Automata and Pattern Matching

Minimisation of DFAs

Minimisation of DFAs 3

From a DFAM = (Q,Σ, δ, s, F )

and any state q ∈ Q we define the new DFA

Mq = (Q,Σ, δ, q, F )

by simply replacing the initial state s with q.

We say two states p and q of M are distinguishable (k-distinguishable)if there exists a string w ∈ Σ∗ (of length k) such that exactly one of Mp

or Mq accepts w .

If there is no such string w then we say p and q are equivalent.

COMPSCI 220: Automata and Pattern Matching 75 / 123

Page 161: COMPSCI 220: Automata and Pattern Matching

Minimisation of DFAs

Minimisation of DFAs 3

From a DFAM = (Q,Σ, δ, s, F )

and any state q ∈ Q we define the new DFA

Mq = (Q,Σ, δ, q, F )

by simply replacing the initial state s with q.

We say two states p and q of M are distinguishable (k-distinguishable)if there exists a string w ∈ Σ∗ (of length k) such that exactly one of Mp

or Mq accepts w .

If there is no such string w then we say p and q are equivalent.

COMPSCI 220: Automata and Pattern Matching 75 / 123

Page 162: COMPSCI 220: Automata and Pattern Matching

Minimisation of DFAs

Minimisation of DFAs 3

From a DFAM = (Q,Σ, δ, s, F )

and any state q ∈ Q we define the new DFA

Mq = (Q,Σ, δ, q, F )

by simply replacing the initial state s with q.

We say two states p and q of M are distinguishable (k-distinguishable)if there exists a string w ∈ Σ∗ (of length k) such that exactly one of Mp

or Mq accepts w .

If there is no such string w then we say p and q are equivalent.

COMPSCI 220: Automata and Pattern Matching 75 / 123

Page 163: COMPSCI 220: Automata and Pattern Matching

Minimisation of DFAs

Minimisation of DFAs 4

Questions:

Does there exist an algorithm deciding whether two states p and qare distinguishable?

Does there exist an algorithm deciding whether two states p and qare k-distinguishable?

Does there exist an algorithm deciding whether two states p and qare equivalent?

COMPSCI 220: Automata and Pattern Matching 76 / 123

Page 164: COMPSCI 220: Automata and Pattern Matching

Minimisation of DFAs

Minimisation of DFAs 4

Questions:

Does there exist an algorithm deciding whether two states p and qare distinguishable?

Does there exist an algorithm deciding whether two states p and qare k-distinguishable?

Does there exist an algorithm deciding whether two states p and qare equivalent?

COMPSCI 220: Automata and Pattern Matching 76 / 123

Page 165: COMPSCI 220: Automata and Pattern Matching

Minimisation of DFAs

Minimisation of DFAs 4

Questions:

Does there exist an algorithm deciding whether two states p and qare distinguishable?

Does there exist an algorithm deciding whether two states p and qare k-distinguishable?

Does there exist an algorithm deciding whether two states p and qare equivalent?

COMPSCI 220: Automata and Pattern Matching 76 / 123

Page 166: COMPSCI 220: Automata and Pattern Matching

Minimisation of DFAs

Minimisation of DFAs: elimination lemma 5

If a DFA M has two equivalent states p and q, then one of these statescan be eliminated without modifying the accepted language, hence wecan construct a smaller DFA M′ such that L(M) = L(M ′).

Proof: Assume M = (Q,Σ, δ, s, F ) and p 6= s. We create an equivalentDFA

M ′ = (Q \ {p},Σ, δ′, s, F \ {p}),

where δ′ is δ with all instances of δ(qi , c) = p replaced withδ′(qi , c) = q, and all instances of δ(p, c) = qi deleted.

The resulting automaton M ′ is deterministic and accepts L(M).

COMPSCI 220: Automata and Pattern Matching 77 / 123

Page 167: COMPSCI 220: Automata and Pattern Matching

Minimisation of DFAs

Minimisation of DFAs: distinguish lemma 6

Two states p and q are k-distinguishable if and only if for some c ∈ Σ,the states δ(p, c) and δ(q, c) are (k − 1)-distinguishable.

Proof: Consider all strings w = cw ′ of length k . If δ(p, c) and δ(q, c)are (k − 1)-distinguishable by some string w ′, then p and q must bek-distinguishable by w .

Likewise, if p and q are k-distinguishable by w , then there exist twostates δ(p, c) and δ(q, c) that are (k − 1)-distinguishable by the shorterstring w ′.

COMPSCI 220: Automata and Pattern Matching 78 / 123

Page 168: COMPSCI 220: Automata and Pattern Matching

Minimisation of DFAs

Minimisation of DFAs: the algorithm 7

The algorithm minimizeDFA finds the equivalent states of a DFAM = (Q,Σ, δ, s, F ). It defines a series of equivalence relations ≡0, ≡1,. . . on the states of Q:

p ≡0 q if both p and q are in F or both not in F .p ≡k+1 q if p ≡k q and, for each c ∈ Σ, δ(p, c) ≡k δ(q, c).

It stops generating these equivalence classes when ≡n and ≡n+1 areidentical.

COMPSCI 220: Automata and Pattern Matching 79 / 123

Page 169: COMPSCI 220: Automata and Pattern Matching

Minimisation of DFAs

Minimisation of DFAs: correctness of the algorithm 8

Is the algorithm correct?

Distinguish lemma guarantees no more non-equivalent states.

Since there can be at most the number of states non-equivalent states,the number of equivalence relations ≡k generated cannot be largerthan the number of states.

We can eliminate one state from M (using the elimination lemma)whenever there exist two states p and q such that p ≡n q.

Is the algorithm minimizeDFA optimal?

???

COMPSCI 220: Automata and Pattern Matching 80 / 123

Page 170: COMPSCI 220: Automata and Pattern Matching

Minimisation of DFAs

Minimisation of DFAs: correctness of the algorithm 8

Is the algorithm correct?

Distinguish lemma guarantees no more non-equivalent states.

Since there can be at most the number of states non-equivalent states,the number of equivalence relations ≡k generated cannot be largerthan the number of states.

We can eliminate one state from M (using the elimination lemma)whenever there exist two states p and q such that p ≡n q.

Is the algorithm minimizeDFA optimal?

???

COMPSCI 220: Automata and Pattern Matching 80 / 123

Page 171: COMPSCI 220: Automata and Pattern Matching

Minimisation of DFAs

Minimisation of DFAs: correctness of the algorithm 8

Is the algorithm correct?

Distinguish lemma guarantees no more non-equivalent states.

Since there can be at most the number of states non-equivalent states,the number of equivalence relations ≡k generated cannot be largerthan the number of states.

We can eliminate one state from M (using the elimination lemma)whenever there exist two states p and q such that p ≡n q.

Is the algorithm minimizeDFA optimal?

???

COMPSCI 220: Automata and Pattern Matching 80 / 123

Page 172: COMPSCI 220: Automata and Pattern Matching

Minimisation of DFAs

Minimisation of DFAs: correctness of the algorithm 8

Is the algorithm correct?

Distinguish lemma guarantees no more non-equivalent states.

Since there can be at most the number of states non-equivalent states,the number of equivalence relations ≡k generated cannot be largerthan the number of states.

We can eliminate one state from M (using the elimination lemma)whenever there exist two states p and q such that p ≡n q.

Is the algorithm minimizeDFA optimal?

???

COMPSCI 220: Automata and Pattern Matching 80 / 123

Page 173: COMPSCI 220: Automata and Pattern Matching

Minimisation of DFAs

Minimisation of DFAs: correctness of the algorithm 8

Is the algorithm correct?

Distinguish lemma guarantees no more non-equivalent states.

Since there can be at most the number of states non-equivalent states,the number of equivalence relations ≡k generated cannot be largerthan the number of states.

We can eliminate one state from M (using the elimination lemma)whenever there exist two states p and q such that p ≡n q.

Is the algorithm minimizeDFA optimal?

???

COMPSCI 220: Automata and Pattern Matching 80 / 123

Page 174: COMPSCI 220: Automata and Pattern Matching

Minimisation of DFAs

Minimisation of DFAs: correctness of the algorithm 8

Is the algorithm correct?

Distinguish lemma guarantees no more non-equivalent states.

Since there can be at most the number of states non-equivalent states,the number of equivalence relations ≡k generated cannot be largerthan the number of states.

We can eliminate one state from M (using the elimination lemma)whenever there exist two states p and q such that p ≡n q.

Is the algorithm minimizeDFA optimal?

???

COMPSCI 220: Automata and Pattern Matching 80 / 123

Page 175: COMPSCI 220: Automata and Pattern Matching

Minimisation of DFAs

Minimisation of DFAs: example 1 9

The DFA M is not minimal as:≡0= {{q0}, {q1, q2}},q1 ≡1 q2,≡1= {{q0}, {q1, q2}},≡0=≡1

becauseδ(q1, a) = q2 ≡0 δ(q2, a) = q1,δ(q1, b) = q0 ≡0 δ(q2, b) = q0

q0 q1

q2

b

a

b

a

a

b

COMPSCI 220: Automata and Pattern Matching 81 / 123

Page 176: COMPSCI 220: Automata and Pattern Matching

Minimisation of DFAs

Minimisation of DFAs: example 1 10

The following DFA is minimal and equivalent to M:

q0 q1

ab

a

b

COMPSCI 220: Automata and Pattern Matching 82 / 123

Page 177: COMPSCI 220: Automata and Pattern Matching

Minimisation of DFAs

Minimisation of DFAs: example 2 11

The DFA M is not minimal as:≡0= {{q0, q1, q3}, {q2, q4}},

≡1= {{q0}, {q1, q3}, {q2, q4}},≡2=≡1,becauseδ(q2, 0) = q2 ≡0 δ(q4, 0) = q4,δ(q2, 1) = q4 ≡0 δ(q4, 1) = q4,δ(q0, 0) = q1 6≡0 δ(q1, 0) = q2,δ(q0, 0) = q1 6≡0 δ(q3, 0) = q2,δ(q1, 0) = q2 ≡0 δ(q3, 0) = q2,δ(q1, 1) = q2 ≡0 δ(q3, 1) = q4

q0 q1 q2

q3 q4

0

0, 1

0

1

0, 1

1

0

1

COMPSCI 220: Automata and Pattern Matching 83 / 123

Page 178: COMPSCI 220: Automata and Pattern Matching

Minimisation of DFAs

Minimisation of DFAs: example 2 12

The following DFA is minimal and equivalent to M:

q0 q1

q2

0, 1

0, 1

0, 1

COMPSCI 220: Automata and Pattern Matching 84 / 123

Page 179: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Searching with GREP

A grep pattern, also known as a regular expression, describes the textthat we are looking for.

For instance, a pattern can describe words that begin with C and endin l. A pattern like this would match “Call”, “Cornwall”, and as well asmany other words, but not “Computer”.

Most characters that we type into the Find & Replace dialogue (in yourfavourite editor) match themselves. For instance, if you are looking forthe letter “s”, Grep stops and reports a match when it encounters an“s” in the text.

A range of characters can be enclosed in square brackets. Forexample [a-z] would denote the set of lower case letters. Aperiod . is a wild card symbol used to denote any character except anewline.

COMPSCI 220: Automata and Pattern Matching 85 / 123

Page 180: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Searching with GREP

A grep pattern, also known as a regular expression, describes the textthat we are looking for.

For instance, a pattern can describe words that begin with C and endin l. A pattern like this would match “Call”, “Cornwall”, and as well asmany other words, but not “Computer”.

Most characters that we type into the Find & Replace dialogue (in yourfavourite editor) match themselves. For instance, if you are looking forthe letter “s”, Grep stops and reports a match when it encounters an“s” in the text.

A range of characters can be enclosed in square brackets. Forexample [a-z] would denote the set of lower case letters. Aperiod . is a wild card symbol used to denote any character except anewline.

COMPSCI 220: Automata and Pattern Matching 85 / 123

Page 181: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Searching with GREP

A grep pattern, also known as a regular expression, describes the textthat we are looking for.

For instance, a pattern can describe words that begin with C and endin l. A pattern like this would match “Call”, “Cornwall”, and as well asmany other words, but not “Computer”.

Most characters that we type into the Find & Replace dialogue (in yourfavourite editor) match themselves. For instance, if you are looking forthe letter “s”, Grep stops and reports a match when it encounters an“s” in the text.

A range of characters can be enclosed in square brackets. Forexample [a-z] would denote the set of lower case letters. Aperiod . is a wild card symbol used to denote any character except anewline.

COMPSCI 220: Automata and Pattern Matching 85 / 123

Page 182: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Searching with GREP

A grep pattern, also known as a regular expression, describes the textthat we are looking for.

For instance, a pattern can describe words that begin with C and endin l. A pattern like this would match “Call”, “Cornwall”, and as well asmany other words, but not “Computer”.

Most characters that we type into the Find & Replace dialogue (in yourfavourite editor) match themselves. For instance, if you are looking forthe letter “s”, Grep stops and reports a match when it encounters an“s” in the text.

A range of characters can be enclosed in square brackets. Forexample [a-z] would denote the set of lower case letters. Aperiod . is a wild card symbol used to denote any character except anewline.

COMPSCI 220: Automata and Pattern Matching 85 / 123

Page 183: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Regular expressions

The Kleene regular expressions over the alphabet Σ and the sets theydesignate are:

1 Any c ∈ Σ is a regular expression denoting the set {c}.2 If E1, E2 are regular expressions and E1 denotes the set S1, E2

denotes the set S2, then so are:◮ E1 + E2 (or E1|E2) which denotes the union S1 ∪ S2,◮ E1E2 which denotes the concatenation S1S2,◮ E∗

1 which denotes the Kleene closure S∗

1 .

COMPSCI 220: Automata and Pattern Matching 86 / 123

Page 184: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Regular expressions

The Kleene regular expressions over the alphabet Σ and the sets theydesignate are:

1 Any c ∈ Σ is a regular expression denoting the set {c}.2 If E1, E2 are regular expressions and E1 denotes the set S1, E2

denotes the set S2, then so are:◮ E1 + E2 (or E1|E2) which denotes the union S1 ∪ S2,◮ E1E2 which denotes the concatenation S1S2,◮ E∗

1 which denotes the Kleene closure S∗

1 .

COMPSCI 220: Automata and Pattern Matching 86 / 123

Page 185: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Regular expressions

The Kleene regular expressions over the alphabet Σ and the sets theydesignate are:

1 Any c ∈ Σ is a regular expression denoting the set {c}.2 If E1, E2 are regular expressions and E1 denotes the set S1, E2

denotes the set S2, then so are:◮ E1 + E2 (or E1|E2) which denotes the union S1 ∪ S2,◮ E1E2 which denotes the concatenation S1S2,◮ E∗

1 which denotes the Kleene closure S∗

1 .

COMPSCI 220: Automata and Pattern Matching 86 / 123

Page 186: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Regular expressions

The Kleene regular expressions over the alphabet Σ and the sets theydesignate are:

1 Any c ∈ Σ is a regular expression denoting the set {c}.2 If E1, E2 are regular expressions and E1 denotes the set S1, E2

denotes the set S2, then so are:◮ E1 + E2 (or E1|E2) which denotes the union S1 ∪ S2,◮ E1E2 which denotes the concatenation S1S2,◮ E∗

1 which denotes the Kleene closure S∗

1 .

COMPSCI 220: Automata and Pattern Matching 86 / 123

Page 187: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Examples of regular expressions

Sample regular expressions over Σ = {a, b, c} and their correspondingsets (languages):

regular expression denoted set (language)a {a}ab {ab}a + bb {a, bb}(a + b)c {ac, bc}c∗ {ε, c, cc, ccc, . . .}(a + b + c)cba {acba, bcba, ccba}a∗ + b∗ + c∗ {ε, a, b, c, aa, bb, cc, aaa, bbb, ccc, . . .}(a + b∗)c(c∗) {ac, acc, accc, . . . , c, cc, ccc, . . . ,

bc, bcc, bbccc, . . .}

COMPSCI 220: Automata and Pattern Matching 87 / 123

Page 188: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Kleene’s Theorem 1

A regular set over an alphabet Σ is either the empty set, the set {ε}, orthe set of strings denoted by some regular expression.

Kleene’s Theorem: Regular sets coincide with regular languages.

Proof: We will show only one implication: For any regular set L there isan NFA N such that L(N) = L.

NFAs for L = ∅ and L = {ε} are easy to construct: an NFA with nofinal states works in the first case and an NFA with one initial andfinal state and no transitions works in the second case.

Now suppose E is a regular expression for L. We construct anNFA N such that L(N) = L based on the length of E . We proceedby induction.

COMPSCI 220: Automata and Pattern Matching 88 / 123

Page 189: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Kleene’s Theorem 1

A regular set over an alphabet Σ is either the empty set, the set {ε}, orthe set of strings denoted by some regular expression.

Kleene’s Theorem: Regular sets coincide with regular languages.

Proof: We will show only one implication: For any regular set L there isan NFA N such that L(N) = L.

NFAs for L = ∅ and L = {ε} are easy to construct: an NFA with nofinal states works in the first case and an NFA with one initial andfinal state and no transitions works in the second case.

Now suppose E is a regular expression for L. We construct anNFA N such that L(N) = L based on the length of E . We proceedby induction.

COMPSCI 220: Automata and Pattern Matching 88 / 123

Page 190: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Kleene’s Theorem 1

A regular set over an alphabet Σ is either the empty set, the set {ε}, orthe set of strings denoted by some regular expression.

Kleene’s Theorem: Regular sets coincide with regular languages.

Proof: We will show only one implication: For any regular set L there isan NFA N such that L(N) = L.

NFAs for L = ∅ and L = {ε} are easy to construct: an NFA with nofinal states works in the first case and an NFA with one initial andfinal state and no transitions works in the second case.

Now suppose E is a regular expression for L. We construct anNFA N such that L(N) = L based on the length of E . We proceedby induction.

COMPSCI 220: Automata and Pattern Matching 88 / 123

Page 191: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Kleene’s Theorem 1

A regular set over an alphabet Σ is either the empty set, the set {ε}, orthe set of strings denoted by some regular expression.

Kleene’s Theorem: Regular sets coincide with regular languages.

Proof: We will show only one implication: For any regular set L there isan NFA N such that L(N) = L.

NFAs for L = ∅ and L = {ε} are easy to construct: an NFA with nofinal states works in the first case and an NFA with one initial andfinal state and no transitions works in the second case.

Now suppose E is a regular expression for L. We construct anNFA N such that L(N) = L based on the length of E . We proceedby induction.

COMPSCI 220: Automata and Pattern Matching 88 / 123

Page 192: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Kleene’s Theorem 1

A regular set over an alphabet Σ is either the empty set, the set {ε}, orthe set of strings denoted by some regular expression.

Kleene’s Theorem: Regular sets coincide with regular languages.

Proof: We will show only one implication: For any regular set L there isan NFA N such that L(N) = L.

NFAs for L = ∅ and L = {ε} are easy to construct: an NFA with nofinal states works in the first case and an NFA with one initial andfinal state and no transitions works in the second case.

Now suppose E is a regular expression for L. We construct anNFA N such that L(N) = L based on the length of E . We proceedby induction.

COMPSCI 220: Automata and Pattern Matching 88 / 123

Page 193: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Kleene’s Theorem 2

Verification: If E = {c} for some c ∈ Σ, then we can takeN = (Q,Σ, δ, S, F ) where Q = {q0, q1}, S = {q0}, F = {q1} andthere is one transition δ(q0, c) = q1.Induction:

◮ If N1, N2 are NFAs accepting the languages denoted by E1 and E2,respectively, then in view of the closure under union the NFA Nunion

accepts the language denoted by E1 + E2:

L(Nunion) = L(N1) ∪ L(N2).

COMPSCI 220: Automata and Pattern Matching 89 / 123

Page 194: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Kleene’s Theorem 2

Verification: If E = {c} for some c ∈ Σ, then we can takeN = (Q,Σ, δ, S, F ) where Q = {q0, q1}, S = {q0}, F = {q1} andthere is one transition δ(q0, c) = q1.Induction:

◮ If N1, N2 are NFAs accepting the languages denoted by E1 and E2,respectively, then in view of the closure under union the NFA Nunion

accepts the language denoted by E1 + E2:

L(Nunion) = L(N1) ∪ L(N2).

COMPSCI 220: Automata and Pattern Matching 89 / 123

Page 195: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Kleene’s Theorem 3

Induction (continued):◮ If N1, N2 are NFAs accepting the languages denoted by E1 and E2,

respectively, then in view of the closure under concatenation theNFA Nconcatenation accepts the language denoted by E1E2:

L(Nconcatenation) = L(N1)L(N2).

◮ If N1 is a NFA accepting the language denoted by E1, then in viewof the closure under Kleene closure the NFA N∗ accepts thelanguage denoted by E∗

1 :

L(N∗) = L(N1)∗.

COMPSCI 220: Automata and Pattern Matching 90 / 123

Page 196: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Kleene’s Theorem 3

Induction (continued):◮ If N1, N2 are NFAs accepting the languages denoted by E1 and E2,

respectively, then in view of the closure under concatenation theNFA Nconcatenation accepts the language denoted by E1E2:

L(Nconcatenation) = L(N1)L(N2).

◮ If N1 is a NFA accepting the language denoted by E1, then in viewof the closure under Kleene closure the NFA N∗ accepts thelanguage denoted by E∗

1 :

L(N∗) = L(N1)∗.

COMPSCI 220: Automata and Pattern Matching 90 / 123

Page 197: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Kleene’s Theorem 3

Induction (continued):◮ If N1, N2 are NFAs accepting the languages denoted by E1 and E2,

respectively, then in view of the closure under concatenation theNFA Nconcatenation accepts the language denoted by E1E2:

L(Nconcatenation) = L(N1)L(N2).

◮ If N1 is a NFA accepting the language denoted by E1, then in viewof the closure under Kleene closure the NFA N∗ accepts thelanguage denoted by E∗

1 :

L(N∗) = L(N1)∗.

COMPSCI 220: Automata and Pattern Matching 90 / 123

Page 198: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Kleene’s Theorem: an example 1

Construct an NFA accepting exactly the language denoted by theregular expression: (01)∗ + 1.We use the closure properties of regular languages:

construct NFAs N1 and N2 accepting the languages {0} and {1},respectively

construct an NFA N3 for the concatenation of L(N1) and L(N2)obtaining the language {01}

construct an NFA N4 for the Kleene closure of L(N3) so obtaining{01}∗

construct an NFA N5 for the union of L(N4) and L(N2) obtainingthe language {01}∗ ∪ {1}

we may want to transform N5 into an equivalent DFA (alsominimise it)

COMPSCI 220: Automata and Pattern Matching 91 / 123

Page 199: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Kleene’s Theorem: an example 1

Construct an NFA accepting exactly the language denoted by theregular expression: (01)∗ + 1.We use the closure properties of regular languages:

construct NFAs N1 and N2 accepting the languages {0} and {1},respectively

construct an NFA N3 for the concatenation of L(N1) and L(N2)obtaining the language {01}

construct an NFA N4 for the Kleene closure of L(N3) so obtaining{01}∗

construct an NFA N5 for the union of L(N4) and L(N2) obtainingthe language {01}∗ ∪ {1}

we may want to transform N5 into an equivalent DFA (alsominimise it)

COMPSCI 220: Automata and Pattern Matching 91 / 123

Page 200: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Kleene’s Theorem: an example 1

Construct an NFA accepting exactly the language denoted by theregular expression: (01)∗ + 1.We use the closure properties of regular languages:

construct NFAs N1 and N2 accepting the languages {0} and {1},respectively

construct an NFA N3 for the concatenation of L(N1) and L(N2)obtaining the language {01}

construct an NFA N4 for the Kleene closure of L(N3) so obtaining{01}∗

construct an NFA N5 for the union of L(N4) and L(N2) obtainingthe language {01}∗ ∪ {1}

we may want to transform N5 into an equivalent DFA (alsominimise it)

COMPSCI 220: Automata and Pattern Matching 91 / 123

Page 201: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Kleene’s Theorem: an example 1

Construct an NFA accepting exactly the language denoted by theregular expression: (01)∗ + 1.We use the closure properties of regular languages:

construct NFAs N1 and N2 accepting the languages {0} and {1},respectively

construct an NFA N3 for the concatenation of L(N1) and L(N2)obtaining the language {01}

construct an NFA N4 for the Kleene closure of L(N3) so obtaining{01}∗

construct an NFA N5 for the union of L(N4) and L(N2) obtainingthe language {01}∗ ∪ {1}

we may want to transform N5 into an equivalent DFA (alsominimise it)

COMPSCI 220: Automata and Pattern Matching 91 / 123

Page 202: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Kleene’s Theorem: an example 1

Construct an NFA accepting exactly the language denoted by theregular expression: (01)∗ + 1.We use the closure properties of regular languages:

construct NFAs N1 and N2 accepting the languages {0} and {1},respectively

construct an NFA N3 for the concatenation of L(N1) and L(N2)obtaining the language {01}

construct an NFA N4 for the Kleene closure of L(N3) so obtaining{01}∗

construct an NFA N5 for the union of L(N4) and L(N2) obtainingthe language {01}∗ ∪ {1}

we may want to transform N5 into an equivalent DFA (alsominimise it)

COMPSCI 220: Automata and Pattern Matching 91 / 123

Page 203: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Kleene’s Theorem: an example 1

Construct an NFA accepting exactly the language denoted by theregular expression: (01)∗ + 1.We use the closure properties of regular languages:

construct NFAs N1 and N2 accepting the languages {0} and {1},respectively

construct an NFA N3 for the concatenation of L(N1) and L(N2)obtaining the language {01}

construct an NFA N4 for the Kleene closure of L(N3) so obtaining{01}∗

construct an NFA N5 for the union of L(N4) and L(N2) obtainingthe language {01}∗ ∪ {1}

we may want to transform N5 into an equivalent DFA (alsominimise it)

COMPSCI 220: Automata and Pattern Matching 91 / 123

Page 204: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Kleene’s Theorem: an example 1

Construct an NFA accepting exactly the language denoted by theregular expression: (01)∗ + 1.We use the closure properties of regular languages:

construct NFAs N1 and N2 accepting the languages {0} and {1},respectively

construct an NFA N3 for the concatenation of L(N1) and L(N2)obtaining the language {01}

construct an NFA N4 for the Kleene closure of L(N3) so obtaining{01}∗

construct an NFA N5 for the union of L(N4) and L(N2) obtainingthe language {01}∗ ∪ {1}

we may want to transform N5 into an equivalent DFA (alsominimise it)

COMPSCI 220: Automata and Pattern Matching 91 / 123

Page 205: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Kleene’s Theorem: other examples 2

Construct a regular expression denoting the language:

A = {0n1m | n, m ≥ 0}.

The language L is regular and

A = {0n1m | n, m ≥ 0}

= {0n | n ≥ 0}{1m | m ≥ 0}

so A is denoted by 0∗1∗.

There is no a regular expression denoting the language:

B = {0n1n | n ≥ 0}

because B is not regular.

COMPSCI 220: Automata and Pattern Matching 92 / 123

Page 206: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Kleene’s Theorem: other examples 2

Construct a regular expression denoting the language:

A = {0n1m | n, m ≥ 0}.

The language L is regular and

A = {0n1m | n, m ≥ 0}

= {0n | n ≥ 0}{1m | m ≥ 0}

so A is denoted by 0∗1∗.

There is no a regular expression denoting the language:

B = {0n1n | n ≥ 0}

because B is not regular.

COMPSCI 220: Automata and Pattern Matching 92 / 123

Page 207: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Kleene’s Theorem: other examples 3

There is no a regular expression denoting the language:

C = {uuww | u, w ∈ {a, b}∗}

because C is not regular. Prove this fact!

COMPSCI 220: Automata and Pattern Matching 93 / 123

Page 208: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Kleene’s Theorem: other examples 3

There is no a regular expression denoting the language:

C = {uuww | u, w ∈ {a, b}∗}

because C is not regular. Prove this fact!

COMPSCI 220: Automata and Pattern Matching 93 / 123

Page 209: COMPSCI 220: Automata and Pattern Matching

Pattern matching

The pattern matching problem

The pattern matching problem:

Given a (short) pattern P and a (long) text T , (over analphabet Σ) determine whether P appears somewhere in T .

Example: If P = aba and T = baabababaaaba, then the firstoccurrence of P in T appears at the third character:

T = baabababaaaba

Of course, there are some other occurrences.

COMPSCI 220: Automata and Pattern Matching 94 / 123

Page 210: COMPSCI 220: Automata and Pattern Matching

Pattern matching

The pattern matching problem

The pattern matching problem:

Given a (short) pattern P and a (long) text T , (over analphabet Σ) determine whether P appears somewhere in T .

Example: If P = aba and T = baabababaaaba, then the firstoccurrence of P in T appears at the third character:

T = baabababaaaba

Of course, there are some other occurrences.

COMPSCI 220: Automata and Pattern Matching 94 / 123

Page 211: COMPSCI 220: Automata and Pattern Matching

Pattern matching

The pattern matching problem

The pattern matching problem:

Given a (short) pattern P and a (long) text T , (over analphabet Σ) determine whether P appears somewhere in T .

Example: If P = aba and T = baabababaaaba, then the firstoccurrence of P in T appears at the third character:

T = baabababaaaba

Of course, there are some other occurrences.

COMPSCI 220: Automata and Pattern Matching 94 / 123

Page 212: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Naive string matching 1

Try each possible position the pattern P[1..m] could appear in the textT [1..n]:

for (i=0; T[i] != ’\0’; i++){

for (j=0; T[i+j] != ’\0’ && P[j] != ’\0’&& T[i+j]==P[j]; j++) ;

if (P[j] == ’\0’) found a match}

There are two nested loops; the inner one takes O(m) iterations andthe outer one takes O(n) iterations so the total time is the product,O(mn). This is slow!

COMPSCI 220: Automata and Pattern Matching 95 / 123

Page 213: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Naive string matching 2

An example: if T [1..n] is an, and P[1..m] is b, then it takes mcomparisons each time to discover that we don’t have a match, so mnoverall.

The worst case scenario may not be too frequent because the innerloop usually finds a mismatch quickly and moves on to the nextposition without going through all m steps.

Can we do it better?

COMPSCI 220: Automata and Pattern Matching 96 / 123

Page 214: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Naive string matching 2

An example: if T [1..n] is an, and P[1..m] is b, then it takes mcomparisons each time to discover that we don’t have a match, so mnoverall.

The worst case scenario may not be too frequent because the innerloop usually finds a mismatch quickly and moves on to the nextposition without going through all m steps.

Can we do it better?

COMPSCI 220: Automata and Pattern Matching 96 / 123

Page 215: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Naive string matching 2

An example: if T [1..n] is an, and P[1..m] is b, then it takes mcomparisons each time to discover that we don’t have a match, so mnoverall.

The worst case scenario may not be too frequent because the innerloop usually finds a mismatch quickly and moves on to the nextposition without going through all m steps.

Can we do it better?

COMPSCI 220: Automata and Pattern Matching 96 / 123

Page 216: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Pattern matching and regular languages 1

Solution: Consider the language

A(P) = {x | x contains the pattern P}.

Assume that A(P) is regular! Let M be a DFA for A(P). Whenprocessing an input M must enter an accepting state when it has justfinished ‘seeing’ the first occurrence of P, and thereafter it must remainin some accepting state or other.

COMPSCI 220: Automata and Pattern Matching 97 / 123

Page 217: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Pattern matching and regular languages 2

Is A(P) regular?

Answer: yes.

Example: If P = aba and the alphabet is {a, b}, then

A(P) = {x ∈ {a, b}∗ | x = uPv , for some u, v ∈ {a, b}∗},

or

A(P) = {uabav | u, v ∈ {a, b}∗}.

COMPSCI 220: Automata and Pattern Matching 98 / 123

Page 218: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Pattern matching and regular languages 2

Is A(P) regular?

Answer: yes.

Example: If P = aba and the alphabet is {a, b}, then

A(P) = {x ∈ {a, b}∗ | x = uPv , for some u, v ∈ {a, b}∗},

or

A(P) = {uabav | u, v ∈ {a, b}∗}.

COMPSCI 220: Automata and Pattern Matching 98 / 123

Page 219: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Pattern matching and regular languages 2

Is A(P) regular?

Answer: yes.

Example: If P = aba and the alphabet is {a, b}, then

A(P) = {x ∈ {a, b}∗ | x = uPv , for some u, v ∈ {a, b}∗},

or

A(P) = {uabav | u, v ∈ {a, b}∗}.

COMPSCI 220: Automata and Pattern Matching 98 / 123

Page 220: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Pattern matching and regular languages 3

q0

b

q1

a

q2

q3

a, ba

b

b

a

A DFA for AP(aba)

COMPSCI 220: Automata and Pattern Matching 99 / 123

Page 221: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Pattern matching and regular languages 4

q0

q1

q3

q2

a, b

a, b

a

a

b

An NFA for A(aba)

COMPSCI 220: Automata and Pattern Matching 100 / 123

Page 222: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Pattern matching and regular languages 5

For every string P, the language

A(P) = {uPv | u, v ∈ {a, b}∗}

is regular.

Proof: Let M be a DFA recognising exactly {P}. An NFA recognisingA(P) can be obtained from a DFA M by adding loops labelled with aand b to the initial and final states of M.

COMPSCI 220: Automata and Pattern Matching 101 / 123

Page 223: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Pattern matching and regular languages 5

For every string P, the language

A(P) = {uPv | u, v ∈ {a, b}∗}

is regular.

Proof: Let M be a DFA recognising exactly {P}. An NFA recognisingA(P) can be obtained from a DFA M by adding loops labelled with aand b to the initial and final states of M.

COMPSCI 220: Automata and Pattern Matching 101 / 123

Page 224: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Pattern matching and regular languages 6

Is the fact that A(P) is regular of any use?

Yes, because there is an algorithm testing the membership problem forA(P) which is the same as testing whether P appears in the input textT .

How complex is this algorithm?

COMPSCI 220: Automata and Pattern Matching 102 / 123

Page 225: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Pattern matching and regular languages 6

Is the fact that A(P) is regular of any use?

Yes, because there is an algorithm testing the membership problem forA(P) which is the same as testing whether P appears in the input textT .

How complex is this algorithm?

COMPSCI 220: Automata and Pattern Matching 102 / 123

Page 226: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Pattern matching and regular languages 6

Is the fact that A(P) is regular of any use?

Yes, because there is an algorithm testing the membership problem forA(P) which is the same as testing whether P appears in the input textT .

How complex is this algorithm?

COMPSCI 220: Automata and Pattern Matching 102 / 123

Page 227: COMPSCI 220: Automata and Pattern Matching

Pattern matching

An efficient “automata-theoretic” solution

We will present an efficient “automata-theoretic” solution whichconsists of:

1 “pre-processing”: building a DFA M for each pattern P[1..m], then2 running M on the text T [1..n].

The complexity of this solution is the sum of the complexities of theabove two steps.

COMPSCI 220: Automata and Pattern Matching 103 / 123

Page 228: COMPSCI 220: Automata and Pattern Matching

Pattern matching

An efficient “automata-theoretic” solution

We will present an efficient “automata-theoretic” solution whichconsists of:

1 “pre-processing”: building a DFA M for each pattern P[1..m], then2 running M on the text T [1..n].

The complexity of this solution is the sum of the complexities of theabove two steps.

COMPSCI 220: Automata and Pattern Matching 103 / 123

Page 229: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Prefix, suffix and the suffix function 1

We introduce the prefix and suffix relations and the suffix function.

A string w is a prefix of the string x , w ≤prefix x if x = wz for somestring z.

Let Pk be P[1..k ], the prefix of length k ≤ m of P[1..m].

A string w is a suffix of the string x , w ≤suffix x if x = yw for somestring y .

For example: a ≤prefix ab, ca ≤suffix aabbca, baab ≤suffix baab, butca 6≤prefix aaaaba, ab 6≤suffix abb.

For each string x , ε ≤prefix x , ε ≤suffix x .

COMPSCI 220: Automata and Pattern Matching 104 / 123

Page 230: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Prefix, suffix and the suffix function 1

We introduce the prefix and suffix relations and the suffix function.

A string w is a prefix of the string x , w ≤prefix x if x = wz for somestring z.

Let Pk be P[1..k ], the prefix of length k ≤ m of P[1..m].

A string w is a suffix of the string x , w ≤suffix x if x = yw for somestring y .

For example: a ≤prefix ab, ca ≤suffix aabbca, baab ≤suffix baab, butca 6≤prefix aaaaba, ab 6≤suffix abb.

For each string x , ε ≤prefix x , ε ≤suffix x .

COMPSCI 220: Automata and Pattern Matching 104 / 123

Page 231: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Prefix, suffix and the suffix function 1

We introduce the prefix and suffix relations and the suffix function.

A string w is a prefix of the string x , w ≤prefix x if x = wz for somestring z.

Let Pk be P[1..k ], the prefix of length k ≤ m of P[1..m].

A string w is a suffix of the string x , w ≤suffix x if x = yw for somestring y .

For example: a ≤prefix ab, ca ≤suffix aabbca, baab ≤suffix baab, butca 6≤prefix aaaaba, ab 6≤suffix abb.

For each string x , ε ≤prefix x , ε ≤suffix x .

COMPSCI 220: Automata and Pattern Matching 104 / 123

Page 232: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Prefix, suffix and the suffix function 1

We introduce the prefix and suffix relations and the suffix function.

A string w is a prefix of the string x , w ≤prefix x if x = wz for somestring z.

Let Pk be P[1..k ], the prefix of length k ≤ m of P[1..m].

A string w is a suffix of the string x , w ≤suffix x if x = yw for somestring y .

For example: a ≤prefix ab, ca ≤suffix aabbca, baab ≤suffix baab, butca 6≤prefix aaaaba, ab 6≤suffix abb.

For each string x , ε ≤prefix x , ε ≤suffix x .

COMPSCI 220: Automata and Pattern Matching 104 / 123

Page 233: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Prefix, suffix and the suffix function 1

We introduce the prefix and suffix relations and the suffix function.

A string w is a prefix of the string x , w ≤prefix x if x = wz for somestring z.

Let Pk be P[1..k ], the prefix of length k ≤ m of P[1..m].

A string w is a suffix of the string x , w ≤suffix x if x = yw for somestring y .

For example: a ≤prefix ab, ca ≤suffix aabbca, baab ≤suffix baab, butca 6≤prefix aaaaba, ab 6≤suffix abb.

For each string x , ε ≤prefix x , ε ≤suffix x .

COMPSCI 220: Automata and Pattern Matching 104 / 123

Page 234: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Prefix, suffix and the suffix function 1

We introduce the prefix and suffix relations and the suffix function.

A string w is a prefix of the string x , w ≤prefix x if x = wz for somestring z.

Let Pk be P[1..k ], the prefix of length k ≤ m of P[1..m].

A string w is a suffix of the string x , w ≤suffix x if x = yw for somestring y .

For example: a ≤prefix ab, ca ≤suffix aabbca, baab ≤suffix baab, butca 6≤prefix aaaaba, ab 6≤suffix abb.

For each string x , ε ≤prefix x , ε ≤suffix x .

COMPSCI 220: Automata and Pattern Matching 104 / 123

Page 235: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Prefix, suffix and the suffix function 2

The suffix function associated to the pattern P[1..m] is the function

σ : Σ∗ → {0, 1, . . . , m}

defined as follows: σ(x) is the length of the longest prefix of P that is asuffix of x ,

σ(x) = max{k : Pk ≤suffix x}.

COMPSCI 220: Automata and Pattern Matching 105 / 123

Page 236: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Prefix, suffix and the suffix function 3

For example, if P = nano and Σ = {a, b, n, o} thenP0 = ε, P1 = n, P2 = na, P3 = nan, P4 = nano = P (so m = 4).

σ(ε) = 0σ(annnao) = 0

σ(aonaaanna) = 2σ(aon) = 1

σ(aonaaannano) = 4σ(annnaanan) = 3.

COMPSCI 220: Automata and Pattern Matching 106 / 123

Page 237: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Prefix, suffix and the suffix function 3

For example, if P = nano and Σ = {a, b, n, o} thenP0 = ε, P1 = n, P2 = na, P3 = nan, P4 = nano = P (so m = 4).

σ(ε) = 0σ(annnao) = 0

σ(aonaaanna) = 2σ(aon) = 1

σ(aonaaannano) = 4σ(annnaanan) = 3.

COMPSCI 220: Automata and Pattern Matching 106 / 123

Page 238: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Prefix, suffix and the suffix function 3

For example, if P = nano and Σ = {a, b, n, o} thenP0 = ε, P1 = n, P2 = na, P3 = nan, P4 = nano = P (so m = 4).

σ(ε) = 0σ(annnao) = 0

σ(aonaaanna) = 2σ(aon) = 1

σ(aonaaannano) = 4σ(annnaanan) = 3.

COMPSCI 220: Automata and Pattern Matching 106 / 123

Page 239: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Prefix, suffix and the suffix function 3

For example, if P = nano and Σ = {a, b, n, o} thenP0 = ε, P1 = n, P2 = na, P3 = nan, P4 = nano = P (so m = 4).

σ(ε) = 0σ(annnao) = 0

σ(aonaaanna) = 2σ(aon) = 1

σ(aonaaannano) = 4σ(annnaanan) = 3.

COMPSCI 220: Automata and Pattern Matching 106 / 123

Page 240: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Prefix, suffix and the suffix function 3

For example, if P = nano and Σ = {a, b, n, o} thenP0 = ε, P1 = n, P2 = na, P3 = nan, P4 = nano = P (so m = 4).

σ(ε) = 0σ(annnao) = 0

σ(aonaaanna) = 2σ(aon) = 1

σ(aonaaannano) = 4σ(annnaanan) = 3.

COMPSCI 220: Automata and Pattern Matching 106 / 123

Page 241: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Prefix, suffix and the suffix function 3

For example, if P = nano and Σ = {a, b, n, o} thenP0 = ε, P1 = n, P2 = na, P3 = nan, P4 = nano = P (so m = 4).

σ(ε) = 0σ(annnao) = 0

σ(aonaaanna) = 2σ(aon) = 1

σ(aonaaannano) = 4σ(annnaanan) = 3.

COMPSCI 220: Automata and Pattern Matching 106 / 123

Page 242: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Prefix, suffix and the suffix function 3

For example, if P = nano and Σ = {a, b, n, o} thenP0 = ε, P1 = n, P2 = na, P3 = nan, P4 = nano = P (so m = 4).

σ(ε) = 0σ(annnao) = 0

σ(aonaaanna) = 2σ(aon) = 1

σ(aonaaannano) = 4σ(annnaanan) = 3.

COMPSCI 220: Automata and Pattern Matching 106 / 123

Page 243: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Aho-Corasick automaton 1

The automaton states will record partial matches to thepattern.

In particular they will tell whether

we have already matched P[1..m] in T [1..n], and, if not,

we could possibly be in the middle of a match.

So we will have m + 1 states: the initial and accept states are clear:0, m, respectively.

The transition function from (state, character) to state is the longeststring that is simultaneously a prefix of the pattern and a suffix of thatprefix of the pattern plus the character we have just scanned.

COMPSCI 220: Automata and Pattern Matching 107 / 123

Page 244: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Aho-Corasick automaton 1

The automaton states will record partial matches to thepattern.

In particular they will tell whether

we have already matched P[1..m] in T [1..n], and, if not,

we could possibly be in the middle of a match.

So we will have m + 1 states: the initial and accept states are clear:0, m, respectively.

The transition function from (state, character) to state is the longeststring that is simultaneously a prefix of the pattern and a suffix of thatprefix of the pattern plus the character we have just scanned.

COMPSCI 220: Automata and Pattern Matching 107 / 123

Page 245: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Aho-Corasick automaton 1

The automaton states will record partial matches to thepattern.

In particular they will tell whether

we have already matched P[1..m] in T [1..n], and, if not,

we could possibly be in the middle of a match.

So we will have m + 1 states: the initial and accept states are clear:0, m, respectively.

The transition function from (state, character) to state is the longeststring that is simultaneously a prefix of the pattern and a suffix of thatprefix of the pattern plus the character we have just scanned.

COMPSCI 220: Automata and Pattern Matching 107 / 123

Page 246: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Aho-Corasick automaton 2

Given the pattern P[1..m] over Σ, the Aho-Corasick DFAM = (Q,Σ, δ, s, F ) is constructed as follows:

1 the set of states: Q = {0, 1, . . . , m},2 the alphabet: Σ,3 the transition function δ from Q × Σ to Q is defined by

δ(q, x) = σ(Pqx),

where q ∈ Q, x ∈ Σ,4 0 is the start state,5 F = {m} is the (unique) accepting state.

COMPSCI 220: Automata and Pattern Matching 108 / 123

Page 247: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Aho-Corasick automaton 2

Given the pattern P[1..m] over Σ, the Aho-Corasick DFAM = (Q,Σ, δ, s, F ) is constructed as follows:

1 the set of states: Q = {0, 1, . . . , m},2 the alphabet: Σ,3 the transition function δ from Q × Σ to Q is defined by

δ(q, x) = σ(Pqx),

where q ∈ Q, x ∈ Σ,4 0 is the start state,5 F = {m} is the (unique) accepting state.

COMPSCI 220: Automata and Pattern Matching 108 / 123

Page 248: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Aho-Corasick automaton 2

Given the pattern P[1..m] over Σ, the Aho-Corasick DFAM = (Q,Σ, δ, s, F ) is constructed as follows:

1 the set of states: Q = {0, 1, . . . , m},2 the alphabet: Σ,3 the transition function δ from Q × Σ to Q is defined by

δ(q, x) = σ(Pqx),

where q ∈ Q, x ∈ Σ,4 0 is the start state,5 F = {m} is the (unique) accepting state.

COMPSCI 220: Automata and Pattern Matching 108 / 123

Page 249: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Aho-Corasick automaton 2

Given the pattern P[1..m] over Σ, the Aho-Corasick DFAM = (Q,Σ, δ, s, F ) is constructed as follows:

1 the set of states: Q = {0, 1, . . . , m},2 the alphabet: Σ,3 the transition function δ from Q × Σ to Q is defined by

δ(q, x) = σ(Pqx),

where q ∈ Q, x ∈ Σ,4 0 is the start state,5 F = {m} is the (unique) accepting state.

COMPSCI 220: Automata and Pattern Matching 108 / 123

Page 250: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Aho-Corasick automaton 2

Given the pattern P[1..m] over Σ, the Aho-Corasick DFAM = (Q,Σ, δ, s, F ) is constructed as follows:

1 the set of states: Q = {0, 1, . . . , m},2 the alphabet: Σ,3 the transition function δ from Q × Σ to Q is defined by

δ(q, x) = σ(Pqx),

where q ∈ Q, x ∈ Σ,4 0 is the start state,5 F = {m} is the (unique) accepting state.

COMPSCI 220: Automata and Pattern Matching 108 / 123

Page 251: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Aho-Corasick automaton 2

Given the pattern P[1..m] over Σ, the Aho-Corasick DFAM = (Q,Σ, δ, s, F ) is constructed as follows:

1 the set of states: Q = {0, 1, . . . , m},2 the alphabet: Σ,3 the transition function δ from Q × Σ to Q is defined by

δ(q, x) = σ(Pqx),

where q ∈ Q, x ∈ Σ,4 0 is the start state,5 F = {m} is the (unique) accepting state.

COMPSCI 220: Automata and Pattern Matching 108 / 123

Page 252: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Aho-Corasick automaton 3

Here is an example for the alphabet Σ = {a, b, n, o} and patternP = nano, so m = 4. Aho-Corasick automaton M will have:

1 the set of states: Q = {0, 1, 2, 3, 4},2 the alphabet: Σ = {a, b, n, o},3 the transition function δ from Q × Σ to Q is defined by

δ(q, x) = σ(Pqx),

where 0 ≤ q ≤ 4, x ∈ {a, b, n, o},4 0 is the start state,5 F = {4} is the accepting state.

COMPSCI 220: Automata and Pattern Matching 109 / 123

Page 253: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Aho-Corasick automaton 3

Here is an example for the alphabet Σ = {a, b, n, o} and patternP = nano, so m = 4. Aho-Corasick automaton M will have:

1 the set of states: Q = {0, 1, 2, 3, 4},2 the alphabet: Σ = {a, b, n, o},3 the transition function δ from Q × Σ to Q is defined by

δ(q, x) = σ(Pqx),

where 0 ≤ q ≤ 4, x ∈ {a, b, n, o},4 0 is the start state,5 F = {4} is the accepting state.

COMPSCI 220: Automata and Pattern Matching 109 / 123

Page 254: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Aho-Corasick automaton 3

Here is an example for the alphabet Σ = {a, b, n, o} and patternP = nano, so m = 4. Aho-Corasick automaton M will have:

1 the set of states: Q = {0, 1, 2, 3, 4},2 the alphabet: Σ = {a, b, n, o},3 the transition function δ from Q × Σ to Q is defined by

δ(q, x) = σ(Pqx),

where 0 ≤ q ≤ 4, x ∈ {a, b, n, o},4 0 is the start state,5 F = {4} is the accepting state.

COMPSCI 220: Automata and Pattern Matching 109 / 123

Page 255: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Aho-Corasick automaton 3

Here is an example for the alphabet Σ = {a, b, n, o} and patternP = nano, so m = 4. Aho-Corasick automaton M will have:

1 the set of states: Q = {0, 1, 2, 3, 4},2 the alphabet: Σ = {a, b, n, o},3 the transition function δ from Q × Σ to Q is defined by

δ(q, x) = σ(Pqx),

where 0 ≤ q ≤ 4, x ∈ {a, b, n, o},4 0 is the start state,5 F = {4} is the accepting state.

COMPSCI 220: Automata and Pattern Matching 109 / 123

Page 256: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Aho-Corasick automaton 3

Here is an example for the alphabet Σ = {a, b, n, o} and patternP = nano, so m = 4. Aho-Corasick automaton M will have:

1 the set of states: Q = {0, 1, 2, 3, 4},2 the alphabet: Σ = {a, b, n, o},3 the transition function δ from Q × Σ to Q is defined by

δ(q, x) = σ(Pqx),

where 0 ≤ q ≤ 4, x ∈ {a, b, n, o},4 0 is the start state,5 F = {4} is the accepting state.

COMPSCI 220: Automata and Pattern Matching 109 / 123

Page 257: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Aho-Corasick automaton 4

The transition function δ is calculated as follows:

δ(0, a) = σ(P0a) = σ(εa) = σ(a) = 0δ(0, b) = σ(P0b) = σ(εb) = σ(b) = 0δ(0, n) = σ(P0n) = σ(εn) = σ(n) = 1δ(0, o) = σ(P0o) = σ(εo) = σ(0) = 0

δ(1, a) = σ(P1a) = σ(na) = 2δ(1, b) = σ(P1b) = σ(nb) = 0δ(1, n) = σ(P1n) = σ(nn) = 1δ(1, o) = σ(P1o) = σ(no) = 0

δ(2, a) = σ(P2a) = σ(naa) = 0δ(2, b) = σ(P2b) = σ(nab) = 0δ(2, n) = σ(P2n) = σ(nan) = 3δ(2, o) = σ(P2o) = σ(nao) = 0

COMPSCI 220: Automata and Pattern Matching 110 / 123

Page 258: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Aho-Corasick automaton 4

The transition function δ is calculated as follows:

δ(0, a) = σ(P0a) = σ(εa) = σ(a) = 0δ(0, b) = σ(P0b) = σ(εb) = σ(b) = 0δ(0, n) = σ(P0n) = σ(εn) = σ(n) = 1δ(0, o) = σ(P0o) = σ(εo) = σ(0) = 0

δ(1, a) = σ(P1a) = σ(na) = 2δ(1, b) = σ(P1b) = σ(nb) = 0δ(1, n) = σ(P1n) = σ(nn) = 1δ(1, o) = σ(P1o) = σ(no) = 0

δ(2, a) = σ(P2a) = σ(naa) = 0δ(2, b) = σ(P2b) = σ(nab) = 0δ(2, n) = σ(P2n) = σ(nan) = 3δ(2, o) = σ(P2o) = σ(nao) = 0

COMPSCI 220: Automata and Pattern Matching 110 / 123

Page 259: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Aho-Corasick automaton 5

δ(3, a) = σ(P3a) = σ(nana) = 2δ(3, b) = σ(P3b) = σ(nanb) = 0δ(3, n) = σ(P3n) = σ(nann) = 1δ(3, o) = σ(P3o) = σ(nano) = 4

δ(4, a) = σ(P4a) = σ(nanoa) = 0δ(4, b) = σ(P4b) = σ(nanob) = 0δ(4, n) = σ(P4n) = σ(nanon) = 1δ(4, o) = σ(P4o) = σ(nanoo) = 0

COMPSCI 220: Automata and Pattern Matching 111 / 123

Page 260: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Aho-Corasick automaton 6

A compact presentation of the transition function:

δ(q, x) a b n o P

0 0 0 1 0 n1 2 0 1 0 a2 0 0 3 0 n3 2 0 1 4 o4 0 0 1 0

COMPSCI 220: Automata and Pattern Matching 112 / 123

Page 261: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Aho-Corasick automaton 7

0 1 2 3 4

a, b, o n

n a

b, o

n

a, b, o

a

b

n

o

n

a, b, o

COMPSCI 220: Automata and Pattern Matching 113 / 123

Page 262: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Aho-Corasick automaton 8

The following procedure computes the transition function:

COMPUTE-TRANSITION-FUNCTION (P,Σ)1. m = length[P]2. for q = 0 to m3. do for each character x ∈ Σ4. do k = min(m + 1, q + 2)5. repeat k = k − 16. until Pk ≤suffix Pqx7. δ(q, a) = k8. return δ

COMPSCI 220: Automata and Pattern Matching 114 / 123

Page 263: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Aho-Corasick automaton 9

The procedure computes δ(q, x) in a straightforward manner: it startswith the largest possible value for k , which is min(m, q + 1) anddecreases k until Pk ≤suffix Pqx .

The running time is O(m3 × number of elements in Σ): the outer loopscontribute a factor of m× number of elements in Σ, the inner loops canrun at most m + 1 times and the test Pk ≤suffix Pqx on line 6. canrequire to compare up to m characters.

A clever algorithm requiring O(m × number of elements in Σ) exists!

COMPSCI 220: Automata and Pattern Matching 115 / 123

Page 264: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Aho-Corasick automaton 9

The procedure computes δ(q, x) in a straightforward manner: it startswith the largest possible value for k , which is min(m, q + 1) anddecreases k until Pk ≤suffix Pqx .

The running time is O(m3 × number of elements in Σ): the outer loopscontribute a factor of m× number of elements in Σ, the inner loops canrun at most m + 1 times and the test Pk ≤suffix Pqx on line 6. canrequire to compare up to m characters.

A clever algorithm requiring O(m × number of elements in Σ) exists!

COMPSCI 220: Automata and Pattern Matching 115 / 123

Page 265: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Aho-Corasick automaton 9

The procedure computes δ(q, x) in a straightforward manner: it startswith the largest possible value for k , which is min(m, q + 1) anddecreases k until Pk ≤suffix Pqx .

The running time is O(m3 × number of elements in Σ): the outer loopscontribute a factor of m× number of elements in Σ, the inner loops canrun at most m + 1 times and the test Pk ≤suffix Pqx on line 6. canrequire to compare up to m characters.

A clever algorithm requiring O(m × number of elements in Σ) exists!

COMPSCI 220: Automata and Pattern Matching 115 / 123

Page 266: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Aho-Corasick automaton 10

The following algorithm runs Aho-Corasick automaton for the pattern Pon the text T :

FINITE-AUTOMATON-MATCHER (Σ, δ, T )1. n = length[T ]2. q = 03. for i=1 to n4. do q = δ(q, T [i])5. if q = m6. then print ‘Pattern occurs at

position i − m’ and return7. Print ‘Pattern doesn’t occur’

COMPSCI 220: Automata and Pattern Matching 116 / 123

Page 267: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Aho-Corasick automaton 11

The simple loop structure of the above algorithm shows that therunning time on T [1..n] is O(n). The overall running time, i.e. whichincludes the pre-processing, is now

O(m × number of elements in Σ) + O(n).

COMPSCI 220: Automata and Pattern Matching 117 / 123

Page 268: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Aho-Corasick automaton 12

Consider the example for the alphabet Σ = {a, b, n, o} and patternP = nano. Running the Aho-Corasick automaton M described aboveon the text T = annnaananoaa we get:

i 1 2 3 4 5 6 7 8 9 10 11 12 13T [i] a n n n a a n a n o a astate 0 0 1 1 1 2 0 1 2 3 4 0 0

n a n o

so the match was found at position i − m = 11 − 4 = 7.

COMPSCI 220: Automata and Pattern Matching 118 / 123

Page 269: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Aho-Corasick automaton 12

Consider the example for the alphabet Σ = {a, b, n, o} and patternP = nano. Running the Aho-Corasick automaton M described aboveon the text T = annnaananoaa we get:

i 1 2 3 4 5 6 7 8 9 10 11 12 13T [i] a n n n a a n a n o a astate 0 0 1 1 1 2 0 1 2 3 4 0 0

n a n o

so the match was found at position i − m = 11 − 4 = 7.

COMPSCI 220: Automata and Pattern Matching 118 / 123

Page 270: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Aho-Corasick automaton 12

Consider the example for the alphabet Σ = {a, b, n, o} and patternP = nano. Running the Aho-Corasick automaton M described aboveon the text T = annnaananoaa we get:

i 1 2 3 4 5 6 7 8 9 10 11 12 13T [i] a n n n a a n a n o a astate 0 0 1 1 1 2 0 1 2 3 4 0 0

n a n o

so the match was found at position i − m = 11 − 4 = 7.

COMPSCI 220: Automata and Pattern Matching 118 / 123

Page 271: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Knuth-Morris-Pratt algorithm

Knuth-Morris-Pratt algorithm uses the prefix function associated to thepattern P[1..m]

π : {1, 2, . . . , m} → {0, 1, 2, . . . , m − 1}

defined by

π(q) = max{k : k < q and Pk ≤suffix Pq},

i.e. the length of the shortest prefix of P that is a proper suffix ofPq.The overall running time is

O(m + n).

COMPSCI 220: Automata and Pattern Matching 119 / 123

Page 272: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Regexes 1

In the practice of computing regular expressions (abbreviated as regexor regexp, with plural forms regexes) differ from the Kleene definitiondiscussed before.

Regexes are written in a formal language that can be interpreted by aregular expression processor, a program that either serves as a parsergenerator or examines text and identifies parts that match the providedspecification.

COMPSCI 220: Automata and Pattern Matching 120 / 123

Page 273: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Regexes 1

In the practice of computing regular expressions (abbreviated as regexor regexp, with plural forms regexes) differ from the Kleene definitiondiscussed before.

Regexes are written in a formal language that can be interpreted by aregular expression processor, a program that either serves as a parsergenerator or examines text and identifies parts that match the providedspecification.

COMPSCI 220: Automata and Pattern Matching 120 / 123

Page 274: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Regexes 2

There are various versions of regexes; they provide an expressivepower that exceeds the regular languages.

Here is an example. Regexes have the ability to groupsub-expressions with parentheses and recall the value they match inthe same expression.

Using this feature one can write a pattern that matches strings ofrepeated words like “papatoetoe” (squares). The regex to match“papatoetoe” is

(.∗)\1(.∗)\2,

where \1 =pa and \2 =toe were the sub-matches. The languageassociated to this pattern is not regular.

COMPSCI 220: Automata and Pattern Matching 121 / 123

Page 275: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Regexes 2

There are various versions of regexes; they provide an expressivepower that exceeds the regular languages.

Here is an example. Regexes have the ability to groupsub-expressions with parentheses and recall the value they match inthe same expression.

Using this feature one can write a pattern that matches strings ofrepeated words like “papatoetoe” (squares). The regex to match“papatoetoe” is

(.∗)\1(.∗)\2,

where \1 =pa and \2 =toe were the sub-matches. The languageassociated to this pattern is not regular.

COMPSCI 220: Automata and Pattern Matching 121 / 123

Page 276: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Regexes 2

There are various versions of regexes; they provide an expressivepower that exceeds the regular languages.

Here is an example. Regexes have the ability to groupsub-expressions with parentheses and recall the value they match inthe same expression.

Using this feature one can write a pattern that matches strings ofrepeated words like “papatoetoe” (squares). The regex to match“papatoetoe” is

(.∗)\1(.∗)\2,

where \1 =pa and \2 =toe were the sub-matches. The languageassociated to this pattern is not regular.

COMPSCI 220: Automata and Pattern Matching 121 / 123

Page 277: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Regex and Google search

Although in many cases system administrators can run regex-basedqueries internally, most search engines do not offer regex support tothe public.

With one exception of Google Code Search:

http://www.google.com/codesearch

COMPSCI 220: Automata and Pattern Matching 122 / 123

Page 278: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Regex and Google search

Although in many cases system administrators can run regex-basedqueries internally, most search engines do not offer regex support tothe public.

With one exception of Google Code Search:

http://www.google.com/codesearch

COMPSCI 220: Automata and Pattern Matching 122 / 123

Page 279: COMPSCI 220: Automata and Pattern Matching

Pattern matching

Test distribution

Capacity Room UPI first letter Number of students279 MLT1 A-R 127122 PLT2 S-Z 57

401 184

COMPSCI 220: Automata and Pattern Matching 123 / 123