seungjin choi - postechmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf ·...
TRANSCRIPT
![Page 1: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6d26427f8b9a3b388c9792/html5/thumbnails/1.jpg)
Regular Expressions
Seungjin Choi
Department of Computer Science and EngineeringPohang University of Science and Technology
77 Cheongam-ro, Nam-gu, Pohang 37673, [email protected]
1 / 27
![Page 2: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6d26427f8b9a3b388c9792/html5/thumbnails/2.jpg)
Outline
I Regular expressions: One type of language-defining notation
I Regular expressions and finite automata
I Regular grammars (will be discussed in Lecture 4)
2 / 27
![Page 3: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6d26427f8b9a3b388c9792/html5/thumbnails/3.jpg)
Operators
I Union:L ∪M = {w |w ∈ L or w ∈ M}.
I Concatenation:
LM = {w |w = xy , x ∈ L, y ∈ M}.
I Powers:L0 = {ε}, L1 = L, Ln+1 = LLn.
I Kleene Closure (star-closure):
L∗ = L0 ∪ L1 ∪ L2 · · · = ∪∞i=0Li .
3 / 27
![Page 4: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6d26427f8b9a3b388c9792/html5/thumbnails/4.jpg)
∅i
Question: What are ∅0,∅i ,∅∗?
I ∅0 = {ε}.I ∅i , i ≥ 1 is empty since we cannot select any strings from the empty
set.
I ∅∗ = {ε}.
4 / 27
![Page 5: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6d26427f8b9a3b388c9792/html5/thumbnails/5.jpg)
Regular Expressions
I A FA (DFA or NFA) is a ”blueprint” for constructing a machinerecognizing a regular language. (machine-like description)
I A regular expression is a ”user-friendly” declarative way of describinga regular language. (algebraic description)
I Involves a combination of:I strings of symbols from ΣI parenthesesI operators (+, ·, ∗). (union, concatenation, star-closure)
5 / 27
![Page 6: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6d26427f8b9a3b388c9792/html5/thumbnails/6.jpg)
Examples
I {a, b, c} (regular language) ⇐⇒ a + b + c (regular expression).
I (a + b · c)∗ represents the star-closure of {a} ∪ {bc}, which is{ε, a, bc, abc, bca, bcbc, aaa, . . .}.
I 01∗ + 10∗: The language consisting of all strings that are either asingle 0 followed by any number of 1’s or a single 1 followed by anynumber of 0’s.
I UNIX grep command
I UNIX Lex (lexical analyzer generator) and Flex (fast
lex) tools
6 / 27
![Page 7: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6d26427f8b9a3b388c9792/html5/thumbnails/7.jpg)
Inductive Definition of Regular Expressions
DefinitionLet Σ be a given alphabet. Then
1. ∅, ε, and a ∈ Σ are all regular expressions. These are called primitiveregular expressions.
2. If r1 and r2 are regular expressions, so are r1 + r2, r1 · r2, r∗1 , (r1).
3. A string is a regular expression if and only if it can be derived fromprimitive regular expressions by a finite number of applications of therules described above.
7 / 27
![Page 8: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6d26427f8b9a3b388c9792/html5/thumbnails/8.jpg)
Language L(r)
DefinitionThe language L(r) denoted by any regular expression r , is defined by thefollowing rules:
1. ∅ is a regular expression denoting the empty set.
2. ε is a regular expression denoting {ε}.3. For every a ∈ Σ, a is a regular expression denoting {a}.4. L(r1 + r2) = L(r1) ∪ L(r2).
5. L(r1r2) = L(r1)L(r2).
6. L((r1)) = L(r1).
7. L(r∗1 ) = (L(r1))∗.
8 / 27
![Page 9: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6d26427f8b9a3b388c9792/html5/thumbnails/9.jpg)
Example
Exhibit the language L (a∗ · (a + b)) in set notation.
Solution.
L (a∗ · (a + b)) = L(a∗)L(a + b)
= (L(a))∗ (L(a) ∪ L(b))
= {ε, a, aa, aaa, . . .} · {a, b}= {a, aa, aaa, . . . , b, ab, aab, . . .}.
9 / 27
![Page 10: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6d26427f8b9a3b388c9792/html5/thumbnails/10.jpg)
Another Example
Given Σ = {a, b} and r = (a + b)∗(a + bb), find L(r).
L(r) = L ((a + b)∗) L(a + bb)
= (L(a + b))∗ (L(a) ∪ L(bb))
= (L(a) ∪ L(b))∗(L(a) ∪ L2(b)
)= {a, b}∗ ({a} ∪ {bb})= {a, b}∗{a, bb}= {ε, a, b, aa, ab, ba, bb, aaa, aab, . . .}{a, bb}= {a, bb, aa, abb, ba, bbb, . . .}.
(a + b)∗ represents any string of a’s and b’s.
{a}∗ = {ε, a, aa, . . .}.
10 / 27
![Page 11: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6d26427f8b9a3b388c9792/html5/thumbnails/11.jpg)
More Examples
1. Given r = (aa)∗(bb)∗b, determine L(r).
The set of strings with an even number of a’s followed by an odd numberof b’s, i.e.,
L(r) = {a2nb2m+1 | n ≥ 0,m ≥ 0}.
2. For Σ = {0, 1}, give a regular expression r such that
L(r) = {w ∈ Σ∗ |w has at least one pair of consecutive zeros}.
r = (0 + 1)∗00(0 + 1)∗.
3. Find a regular expression for
L = {w ∈ {0, 1}∗ |w has no pair of consecutive zeros}.
r = (1∗011∗)∗ (0 + ε)︸ ︷︷ ︸ending in 0
+ 1∗(0 + ε)︸ ︷︷ ︸all 1′s
. Alternatively, we see L as the
repetitions of the strings 1 and 01, i.e., r = (1 + 01)∗(0 + ε).
11 / 27
![Page 12: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6d26427f8b9a3b388c9792/html5/thumbnails/12.jpg)
Connection between Regular Expressions and RegularLanguages
For every regular language, there is a regular expression and vice versa.
TheoremLet r be a regular expression. Then, there exists some NFA that acceptsL(r). Consequently L(r) is a regular language.
Proof is shown in the next several slides!
12 / 27
![Page 13: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6d26427f8b9a3b388c9792/html5/thumbnails/13.jpg)
ProofWe begin with automata that accept languages for simple regularexpressions ∅, ε, and a ∈ Σ.
q0 q1
q0 q1e
q0 q1a
M(r)
NFA accepts ∅
NFA accepts ε
NFA accepts a
NFA accepts L(r)
13 / 27
![Page 14: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6d26427f8b9a3b388c9792/html5/thumbnails/14.jpg)
ǫ
ǫ
ǫ
ǫ
M(r1)
M(r2)
Automaton for L(r1 + r2)
ǫǫ ǫM(r1) M(r2)
∗
Automaton for L(r1r2)
14 / 27
![Page 15: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6d26427f8b9a3b388c9792/html5/thumbnails/15.jpg)
ǫ
ǫǫ
ǫ
M(r1)
Automaton for L(r∗1 )
W.l.g we consider a NFA with a single final state.
Just like building automata for L(r1 + r2), L(r1r2), L(r∗1 ), we can buildautomata for arbitrary regular expressions. �
15 / 27
![Page 16: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6d26427f8b9a3b388c9792/html5/thumbnails/16.jpg)
Example
Find an NFA which accepts L(r) where r = (a + bb)∗(ba∗ + ε).
16 / 27
![Page 17: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6d26427f8b9a3b388c9792/html5/thumbnails/17.jpg)
Generalized Transition Graph
Generalized transition graph is a transition graph whose edges are labeledwith regular expressions.
c∗
a+ b
a
L (a∗ + a∗(a + b)c∗)
17 / 27
![Page 18: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6d26427f8b9a3b388c9792/html5/thumbnails/18.jpg)
Remove State q
qi q qj
a b
cde
qi qj
ae∗b
ce∗d ce∗bae∗d
After removing state q
18 / 27
![Page 19: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6d26427f8b9a3b388c9792/html5/thumbnails/19.jpg)
Canonical Form
qi qj
r1
r2
r3 r4
Associated regular expression is r = r∗1 r2 (r4 + r3r∗1 r2)∗.
19 / 27
![Page 20: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6d26427f8b9a3b388c9792/html5/thumbnails/20.jpg)
Regular Language and Regular Expression
TheoremLet L be a regular language. Then there exists a regular expression r suchthat L = L(r).
Proof. Let M be an NFA that accepts L. W.l.g. we can assume that Mhas only one final state and that q0 /∈ F . We interpret the graph M as ageneralized transition graph and apply the construction (illustrated inprevious slides) to it. We use the method of removing state q. Wecontinue this process, removing one state after the other, until we reachthe canonical form. Then the regular expression is of the form in theprevious slide. �
20 / 27
![Page 21: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6d26427f8b9a3b388c9792/html5/thumbnails/21.jpg)
ExampleFind a regular expression for
L = {w ∈ {a, b}∗ | na(w) is even and nb(w) is odd}.
The associate DFA is given by
EE OE
OOEO
a
a
a
a
b bbb
21 / 27
![Page 22: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6d26427f8b9a3b388c9792/html5/thumbnails/22.jpg)
The canonical graph is of the form
EE EO
aa+ ab(bb)∗ba
b+ ab(bb)∗a
b+ a(bb)∗ba a(bb)∗a
22 / 27
![Page 23: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6d26427f8b9a3b388c9792/html5/thumbnails/23.jpg)
Algebraic Laws: Commutativity and Associativity
I Commutative law for union: r1 + r2 = r2 + r1I Commutative law for concatenation: r1r2 6= r2r1I Associative law for union: (r1 + r2) + r3 = r1 + (r2 + r3)
I Associative law for concatenation: (r1r2)r3 = r1(r2r3)
23 / 27
![Page 24: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6d26427f8b9a3b388c9792/html5/thumbnails/24.jpg)
Algebraic Laws: Identities and Annihilators
I Identities (∅ and ε)
∅ + r = r + ∅ = r
ε r = r ε = r
I Annihilator (∅)
∅ r = r ∅ = ∅
24 / 27
![Page 25: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6d26427f8b9a3b388c9792/html5/thumbnails/25.jpg)
Algebraic Laws: Distributive Laws
I Left distributive lat of concatenation over union
r1(r2 + r3) = r1r2 + r1r3
I Right distributive law of concatenation over union
(r2 + r3)r1 = r2r1 + r3r1
25 / 27
![Page 26: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6d26427f8b9a3b388c9792/html5/thumbnails/26.jpg)
Algebraic Laws: Idempotent Law
DefinitionAn operator is said to be idempotent if the result of applying it to two ofthe same values as arguments is that that value.
I Common arithmetic operators are not idempotent, i.e.,
x + x 6= x , x × x 6= x .
I In regular expressions, idempotent law for union
r + r = r
26 / 27
![Page 27: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6d26427f8b9a3b388c9792/html5/thumbnails/27.jpg)
Algebraic Laws: Laws Involving Closures
I (r∗)∗ = r∗
I ∅∗ = ε
I ε∗ = ε
I r+ = rr∗ = r∗r
I r∗ = r+ + ε
27 / 27