”Automaten und Formale Sprachen“
alias
”Theoretische Informatik“
Sommersemester 2014
Dr. Sander BrugginkUbungsleitung: Jan Stuckrath
Sander Bruggink Automaten und Formale Sprachen 1
Organisational Stuff and Introduction
Who are we?
Teacher: Dr. Sander Bruggink
Roomm LF 265
E-Mail: [email protected]
Teaching assistent: Jan Stuckrath
Room LF 265
E-Mail: [email protected]
Tutors: Lars Stoltenow / Martin Weber
Lars Stoltenow: [email protected]
Martin Weber: [email protected]
Sander Bruggink Automaten und Formale Sprachen 2
Organisational Stuff and Introduction
Introduction
Who are you?
BAI
ISE
Others
Website
www.ti.inf.uni-due.de/teaching/ss2013/afs/
Moodle-Site
Sander Bruggink Automaten und Formale Sprachen 3
Organisational Stuff and Introduction
Appointments
Lecture:
Tuesday, 12pm–2pm, room LB 131
Exercise groups:
Group ISE: Tuesday, 8am–10am, Room LE 120 (English)Martin Weber
Group BAI-1: Wednesday, 12pm–2pm Uhr, Room LE 120Lars Stoltenow
Group BAI-2: Thursday, 12pm–2pm, Room LF 125Lars Stoltenow
Group BAI-3: Thursday, 4pm–6pm, Room LC 137Martin Weber
Group BAI-4: Friday, 10am–12pm, Room LC 137Jan Stuckrath
Sander Bruggink Automaten und Formale Sprachen 4
Organisational Stuff and Introduction
Advice about the exercises
Please try to split evenly among the exercise groups.
Visit the exercise groups and do the homework. The material of thislecture can only be mastered by frequent practice. Memorizingdoesn’t help much.
The exercise groups start in the third week of the semester.Thus, the first exercise group take place from 22 to 25 April.
Sander Bruggink Automaten und Formale Sprachen 5
Organisational Stuff and Introduction
Advice about the exercises
The exercise sheet is put online every week on Tuesday at the latest.
The written exercises must be handed in on Tuesday, 8am of thefollowing week. In this week the exercise sheet is discussed in theexercise groups.
Handing in:
in the letter box adjacent to room LF 259.online through Moodle.
Plase write clearly your name, student number and group number onyour exercise. Also write down the name of the lecture.
Sander Bruggink Automaten und Formale Sprachen 6
Organisational Stuff and Introduction
Exam
Oral exam in the module “Theoretische Informatik” (“Automaten undformale Sprachen” together with “Berechenbarkeit und Komplexitat”)
For: BAI
Students, who started this summer semester, may choose to do thetwo oral exams of the module separately.
Klausur
Fur: BAI (PO 2012), ISE, Nebenfach
Sander Bruggink Automaten und Formale Sprachen 7
Organisational Stuff and Introduction
Bonus points
Bonus points:
During the semester the will be 12 (or 11) exercise sheets of 20 pointseach.
If you receive 50% of the points, you will recieve one grade levelhigher (for example 2,0 instead of 2,3) for your exam.
You can obtain 10 extra bonus points by publically presenting theanswer to an exercise in the exercise group (this is possible only once).
For the oral exam of the module “Theoretische Informatik” you mustobtain the bonus in both “Automaten und Formale Sprachen” and in“Berechenbarkeit und Komplexitat”.
Sander Bruggink Automaten und Formale Sprachen 8
Organisational Stuff and Introduction
Literature
We use the following book:
Uwe Schoning: Theoretische Informatik – kurzgefaßt. Spektrum,2008. (5. Auflage)
Other relevant literature:
Neuauflage eines alten Klassikers:Hopcroft, Motwani, Ullman: Introduction to Automata Theory,Languages, and Computation. Addison-Wesley, 2001.
Sander Bruggink Automaten und Formale Sprachen 9
Organisational Stuff and Introduction
Literatur
Sander Bruggink Automaten und Formale Sprachen 10
Organisational Stuff and Introduction Motivation / Introduction
Informal Overview
Automata
Finite representations of languagesfinite automata, pushdown automata, (Turing machines), . . .
Other method to finitely represent languages:grammars, regular expressions
and formal languages
Language = set of finite sequences of symbols (= words)
For example:Set of arithmetical expressionsSet of syntactically correct Java programsSet of all German sentencesSet of satisfiable logical formulas
Sander Bruggink Automaten und Formale Sprachen 11
Organisational Stuff and Introduction Motivation / Introduction
Motivation: Vending Machine
Bild
:W
ikip
edia
50 Cent 20 Cent 10 Cent
Paying 70 cent with 50, 20 und 10 cent coins.
Sander Bruggink Automaten und Formale Sprachen 12
Organisational Stuff and Introduction Motivation / Introduction
Motivation: Vending Machine
Automaton:
0
10
20
30
40
50
60
7010
20
50
10
20
50
10
20
50
10
20
10
20
10
20
10
Language:Sequences of coints (from 10, 20, 50 cent), that are wearth 70 cents.
Sander Bruggink Automaten und Formale Sprachen 13
Organisational Stuff and Introduction Motivation / Introduction
Adventure-Problem
Warming up: we consider the adventure problem, in which an adventurersearches a path through an adventure.
(Later we will find out, what this has to do with formal languages.)
Sander Bruggink Automaten und Formale Sprachen 14
Organisational Stuff and Introduction Motivation / Introduction
42
3
1
5 6
9
8
7
11
10
12
13
14 15 16
Sander Bruggink Automaten und Formale Sprachen 15
Organisational Stuff and Introduction Motivation / Introduction
Adventure Problem (Level 1)
Rules of the Adventure Problem:
The Treasure Rule
You must find at least two treasures.
The Door Rule
You can only go through a door, when you found a key before. (The keycan be used arbitrarily many times.)
Sander Bruggink Automaten und Formale Sprachen 16
Organisational Stuff and Introduction Motivation / Introduction
Adventure Problem (Level 1)
The Dragon Rule
Immediately after the encounter with a dragon, you must jump into a river,because the dragon will otherwise ignite you. This is not the case anymore,if you have previously found a sword, because then you can kill the dragon.
Remark: Dragons, treasures and keys are “refilled” after you left theaccording field.
We are look for a path from a start to an end state, which satisfies all ofthe above conditions.
Sander Bruggink Automaten und Formale Sprachen 17
Organisational Stuff and Introduction Motivation / Introduction
Adventure Problem (Level 1)
Question (Level 1)
Is there a solution in the example? Adventure
Yes! The shortest solution is:1, 2, 3, 1, 2, 4, 10, 4, 5, 6, 4, 5, 6, 4, 11, 12 (length 16).
Is there a general solving procedure which – given an adventure in theform of a graph – can always determine whether there is a solution?
Yes! We will see this procedure in the lecture.
In order to be able to implement this procedure, we need also formaldescription of the rules (door rule, dragon rule, treasure rule).
Sander Bruggink Automaten und Formale Sprachen 18
Organisational Stuff and Introduction Motivation / Introduction
Adventure Problem (Level 2)
New Door Rule
The keys are magical and disappear immediately after being used to opena door. As soon as you go through a door, the door is locked again.
However, you can carry more than one key.
Sander Bruggink Automaten und Formale Sprachen 19
Organisational Stuff and Introduction Motivation / Introduction
Adventure Problem (Level 2)
Questions (Level 2)
Is there a solution in the example? Adventure
Yes! The shortest solution is: 1, 2, 3, 1, 2, 4, 10, 4, 7, 8, 9, 4, 7, 8,9, 4, 11, 12. (length 18)
Is there a general solving procedure?
Yes! We will see this procedure in the lecture.
Why is the new problem harder?
We have to “count” the keys.
Sander Bruggink Automaten und Formale Sprachen 20
Organisational Stuff and Introduction Motivation / Introduction
Adventure Problem (Expert Level)
New Dragon Rule
Swords become unusable by the dragon’s blood, as soon as one has killeda dragon. However, dragons are replaced after being killed.
Key Regel
A magic gate can only be passed, when you don’t own a key.
Sword Rule
A river can only be passed, when you don’t have a sword (otherwise, you’lldrown).
Sander Bruggink Automaten und Formale Sprachen 21
Organisational Stuff and Introduction Motivation / Introduction
Adventure Problem (Expert Level)
Questions (Expert Level)
Is there a solution in the example? Adventure
Yes! The shortest solution is: Ja! Die kurzeste Losung ist 1, 2,3, 1, 2, 4, 10, 4, 7, 8, 9, 4, 10, 4, 5, 6, 4, 11, 12. (Lange 19)
Is there a general solving procedure?
No! It is a so-called undecidable problem.
This is not discussed in this lecture, but in “Berechenbarkeit undKomplexitat”.
Sander Bruggink Automaten und Formale Sprachen 22
Organisational Stuff and Introduction Motivation / Introduction
Adventure-Problem und Formale Sprachen
Automata:Adventure instancesAutomaton that accepts possible solutions
Languages:Possible object sequences of an adventure instanceObject sequences that satisy the treasure ruleObject sequences that satisy the dragon ruleObject sequences that satisy the door rule
Sander Bruggink Automaten und Formale Sprachen 23
Organisational Stuff and Introduction Motivation / Introduction
Formal Languages
Questions
Typical questions here are:
Is a language L empty or does it contain (at least) one word? L = ∅?
Is a word w in the language? w ∈ L?
Are two languages included in one another? L1 ⊆ L2?
Depending on the language (or languages) these question are either
decidable (there is a general procedure to solve the problem) or
undecidable
Sander Bruggink Automaten und Formale Sprachen 24
Organisational Stuff and Introduction Motivation / Introduction
Adventure Problem and Formal Languages
The single levels of the adventure belong to the following language classes:
Level 1 → regular languages
Level 2 → context free languages
Expert level → Chomsky-0 languages (semi-decidable languages)These are discussed in “Berechenbarkeit & Komplexitat”.
Sander Bruggink Automaten und Formale Sprachen 25
Organisational Stuff and Introduction Contents of the Lecture
For theoretical computer science
How can infinite structures be represented by finite descriptions(automata, grammars)?
There are numerous applications – for example in the following areas:
searching in texts (regular expressions)syntax of (programming) languages and compiler constructionmodelling system behaviourverification of systems
Sander Bruggink Automaten und Formale Sprachen 26
Organisational Stuff and Introduction Contents of the Lecture
Contents of the lecture
Automata and formal languages
Mathematical foundations and formal proofs
Languages, grammars and automata
Chomsky Hierarchy (different language classes)
Regular languages and context free languages
How can we show that a language is not of a certain class?
Decision procedures
Closure properties (is the intersection of two regular languages alsoregular?)
Sander Bruggink Automaten und Formale Sprachen 27
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
Sets
Set
A set M of elements is denoted as enumerations
M = {0, 2, 4, 6, 8, . . . }
or a a set of elements with a certain property
M = {n | n ∈ N and n even}
General format:M = {x | P(x)}
(M is the set of all elements x , which satisfy property P.)
Sander Bruggink Automaten und Formale Sprachen 28
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
Sets
Remarks:
The elements of a set a unordered, that is, their order is notimportant. For example:
{1, 2, 3} = {1, 3, 2} = {2, 1, 3} = {2, 3, 1} = {3, 1, 2} = {3, 2, 1}
An element cannot occur in a set more than once. It is either in theset, or not. For example:
{1, 2, 3} 6= {1, 2, 3, 4} = {1, 2, 3, 4, 4}
Sander Bruggink Automaten und Formale Sprachen 29
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
Sets
Element of a set
We write a ∈ M, when an element a is contained in the set M.
Number of elements of a set
For a set M the number of elements of M is denoted by |M|.
Empty set
The empty set (set without elements) is denoted by ∅.
Subset
We write A ⊆ B when every element of A is also an element of B. A isthen called a subset of B. The relation ⊆ is also called inclusion.
Sander Bruggink Automaten und Formale Sprachen 30
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
Sets
Example:
2 ∈ {1, 2, 3}? 3
2 ⊆ {1, 2, 3}? 7 ⇒ {2} ⊆ {1, 2, 3} 3
{1, 2} ∈ {1, 2, 3}? 7
{1, 2} ⊆ {1, 2, 3}? 3
∅ ∈ {A,B,C}? 7
∅ ⊆ {A,B,C}? 3
Sets can also contain sets:1 ∈ {{1}, {3, 4}}? 7
{1} ∈ {{1}, {3, 4}? 3
{1} ⊆ {{1}, {3, 4}? 7
{{1}} ⊆ {{1}, {3, 4}? 3 Wichtig: a 6= {a}
Sander Bruggink Automaten und Formale Sprachen 31
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
Venn-Diagrams
Venn-Diagrams are graphical representation of sets and the relationshipsbetween them.
A B•
B A
Sander Bruggink Automaten und Formale Sprachen 32
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
Set operations
Union: A ∪ B = {e | e ∈ A oder e ∈ B}Intersection: A ∩ B = {e | e ∈ A und e ∈ B}Difference: A \B = {e | e ∈ A und e /∈ B}
A ∪ B A ∩ B A \ B
A B A B A B
Sander Bruggink Automaten und Formale Sprachen 33
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
Power set
Power set
Let M be a set. The set P(M) is the set of all subsets of M.
P(M) = {A | A ⊆ M}
We have: |P(M)| = 2|M| (for a finite set M).
Sander Bruggink Automaten und Formale Sprachen 34
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
Tuples
Tuple
Besides sets we also use tuples, which are written with (round)parenthesis: (a1, . . . , an)
In a tuple the elements are ordered. For example:
(1, 2, 3) 6= (1, 3, 2)
An element can occur multiple times in a tuple. Tuples of differentsize are always unequal. For example
(1, 2, 3, 4) 6= (1, 2, 3, 4, 4)
A tuple (a1, . . . , an) consisting of n elements is called n-tuple. A2-tupel is also called a pair.
Sander Bruggink Automaten und Formale Sprachen 35
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
Cross Product
Cross product (or cartesian product)
Let A,B be two sets. The set A×B is the set of all pairs (a, b), where a isan element of A and b an element of B.
A× B = {(a, b) | a ∈ A, b ∈ B}
We have: |A× B| = |A| · |B| (for finite sets A,B).
Sander Bruggink Automaten und Formale Sprachen 36
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
Relations
Binary relation
Let A,B be two sets. A binary relation between A and B is a set of pairsR ⊆ A× B.
A B
f
g
h
1
2
3
4R
A = {f , g , h}B = {1, 2, 3, 4}R = {(f , 1), (f , 2),
(g , 4), (h, 2)}
Sander Bruggink Automaten und Formale Sprachen 37
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
The sets A and B can also be equal.
Relation over A
Let A be a set. A (binary) relation over A is a set of pairs R ⊆ A× A.
A
1
2
3
45
6 A = {1, 2, 3, 4, 5, 6}R = {(1, 3), (1, 2), (2, 6),
(3, 6), (4, 4), (5, 1),
(5, 5), (5, 6)}
Sander Bruggink Automaten und Formale Sprachen 38
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
Properties of Relations
Let R ⊆ A× A be a elation from A to A.
R is reflexive, when for all x ∈ A: x R x .
a
R is irreflexive, when for all x ∈ A: not x R x .
a 7
Es gibt Relationen die nicht reflexiv aber auch nicht irreflexiv sind.
Sander Bruggink Automaten und Formale Sprachen 39
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
Properties of Relations (Continued)
Let R ⊆ A× A be a elation from A to A.
R is symmetric, when for all x , y ∈ A it holds, that when x R y , theny R x .
a b
R is antisymmetric, when for all x , y ∈ A it holds, that when x R yand y R x , then x = y .
a b7
(wobei a 6= b)
R is transitive, when for x , y , z ∈ A it holds that when x R y andy R z , then x R z .
a b c
Sander Bruggink Automaten und Formale Sprachen 40
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
Special Relations
A quasi-order (or pre-order) is a reflexive, transitive relation.
A order is a reflexive, transitive und antisymmetric relation.
An equivalence relation is a reflexive, transitive and symmetricrelation.
Quasi-order: order: equivalence relation:
a
b
c
d
a
b
c
d
a
b
c
d
Sander Bruggink Automaten und Formale Sprachen 41
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
Functions
Let R ⊆ A× B be a relation from A to B.
R is total, when for all x ∈ A there exists at least one y ∈ B suchthat x R y .
R is non-ambiguous, when for all x ∈ A there exists at most oney ∈ B such that x R y .
Funktion
A funktion f : A→ B is a total und non-ambiguous relation from A to B.
A function maps an element a ∈ A to an element f (a) ∈ B. Here, A is thedomain and B the codomain.
Notation: f (x) = y when x f y .
Sander Bruggink Automaten und Formale Sprachen 42
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
Mathematical Statements
Statements (propositions) make assertions.
Basically, the language of mathematical statements is the language of(classical) predicate logic.
A statement is either true or false.(Law of excluded middle / Tertium non Datur)
As a basic principle, we only hold an assertion true if we can prove it.(Exception: axioms)
Sander Bruggink Automaten und Formale Sprachen 43
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
The following kinds of assertions exist:
Atomic assertions2 is a prime number Prime(2)
x is larger than 5 x > 5
Operatorsnot P ¬P
P and Q P ∧ QP or Q P ∨ Q
when P, then Q P → QP if and only if Q P ↔ Q
Quantifiersthere exists an x such that P ∃x P
for all x P ∀x P
Sander Bruggink Automaten und Formale Sprachen 44
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
Translation of natural language to predicate logic
Translations:All P are Q. ∀x
(P(x)→ Q(x)
)
There exists a P such that Q. ∃x(P(x) ∧ Q(x)
)
Not all operators and quantifiers are explicitly given!
Examples:
”There is a prime number that is even.“
”All prime numbers are odd.“
”Let x be an even prime number. Then x < 10.“
”When x is an even number larger than 3, then x is no prime
number.“
Sander Bruggink Automaten und Formale Sprachen 45
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
Proving Assertions
We will now concentrate on how mathematical assertions of different typescan be proved.
Why do we prove an assertion?
Answer: to convince ourselves and others that the assertion is true.
Sander Bruggink Automaten und Formale Sprachen 46
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
Proving Assertions
During the process of writing a proof there are various kinds of assertions:
assertions that can be used:
assertions that have already been proved (propositions, theorems,lemmas, . . . );assertions that we have assumed true (hypotheses, premisses,assumptions);axioms(assertions that we immediately recognize as being provable);
assertions that we have to prove or refute.
Sander Bruggink Automaten und Formale Sprachen 47
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
Implication
Implication (“if, then”)
”If P, then Q“ (P → Q) is true, when Q follows from P.
Using: If P is known, and P → Q is known, then Q is known. (ModusPonens).
Proving: To prove P → Q, assume P, and show under this assumption,that Q is true.
Refuting: To refute P → Q, show that P is true, but Q isn’t.
Sander Bruggink Automaten und Formale Sprachen 48
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
Conjunction
Conjunction (“and”)
”P and Q“ (P ∧ Q) is true, when P and Q are both true.
Using: When P ∧ Q ist known, then P and Q are also known separately(and can be used as premises).
Proving: To prove P ∧ Q, we have to prove P and prove Q.
Refuting: To refute P ∧ Q, we have to refute P or refute Q.
Sander Bruggink Automaten und Formale Sprachen 49
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
Disjunction
Disjunction (“or”)
”P or Q“ (P ∨ Q) is true, when P is true or Q is true.
Refuting: To refute P ∨ Q, one must refute P or refute Q.
Using: To prove R from P ∨ Q:Assume P, and show under that assumption, that R is true.Assume Q, and show under that assumption, that R is true.Since R follows from both P and Q, and P or Q holds, R must also hold.
Prove: To prove P ∨ Q, you have to prove P or you have to prove Q.
Sander Bruggink Automaten und Formale Sprachen 50
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
Negation
Negation (“not”)
“Not P” (¬P) is true, when P is not true, and false, when P is true.
Using: Negations can be used to prove contradictions.
Proving: You prove ¬P by refuting P.(In many cases you need a proof by contradiction.
Refuting: You refute ¬P by proving P.
Sander Bruggink Automaten und Formale Sprachen 51
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
Universal quantifier
Universal quantifier (“for all”)
“For all x it holds that P“ ( ∀x P ) is true, when P holds for all objects x .
Using: When ∀x P is known, and you have an object a, you know that Pholds for a (i.e. P[x/a] is true, where P[x/a] is P where all occurrences ofx have been replaced by a).
Proving: Assume that a is an arbitrary object. Show that P holds for a.You cannot assume anything about a!
Refuting: Search a counter example; that is, search an object a such thatP doesn’t hold for a.
Sander Bruggink Automaten und Formale Sprachen 52
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
Existential quantifier
Existential quantifier (“there is a”)
“There is a P ( ∃x P ) is true, when an object x exists for which P holds.
Using: When ∃x P is known, you may introduce an object for which Pholds (with an arbitrary name).You cannot assume any other things about this object.
Proving: Search an example: an object a for which P is true.
Refuting: Assume there is an arbitrary object a and show that P does nothold for a.You cannot assume anything else about a.
Sander Bruggink Automaten und Formale Sprachen 53
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
Example
Prove the following theorem:
Theorem
Let A be a set, and � ⊆ A× A a quasi-order on A. Define the relation ≈as follows:
x ≈ y whenever x � y and y � x .
Then ≈ is an equivalence relation.
Sander Bruggink Automaten und Formale Sprachen 54
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
Counter example example
Jan claims that the following assertion is true:
Jan’s “Theorem”
Let R ∈ A× A be a binary relation. When R is symmetric and transitive,then R is reflexive.
Jan motivates as follows: from a R b it follows by symmetry that b R a,and thus by transitivity that a R a.
All persons are fictitious. Any resemblance to real persons, living or dead, is
purely coincidental.
Sander Bruggink Automaten und Formale Sprachen 55
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
Subsets
By definition, “A ⊆ B” means the same thing as “for all x ∈ A it holds,that x ∈ B”.
Proving A ⊆ B: Assume that x is an object of A. Show, under thisassumption, that x ∈ B. You cannot assume any other things about x!
Refuting A ⊆ B: Search for a counter example, that is, an object x ∈ A,such that x /∈ B.
Sander Bruggink Automaten und Formale Sprachen 56
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
Equality of sets
Two sets A and B are equal, when A ⊆ B and B ⊆ A.
Proving A = B: Prove A ⊆ B and prove B ⊆ A.
Refuting A = B: Search for a counter example, that is an object x ∈ A,such that x /∈ B, or an object x ∈ B, such that x /∈ A.
Sander Bruggink Automaten und Formale Sprachen 57
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
Example
Prove the following theorem:
Theorem (Law of distributivity)
For sets A,B,C it holds that:
(A ∩ B) ∪ C = (A ∪ C ) ∩ (B ∪ C )
Sander Bruggink Automaten und Formale Sprachen 58
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
Proof by contradiction
Proof by contradiction (Reductio ad absurdum)
Prove an assertion P, by assuming its negation and then deducing acontradiction.
Example: Prove the following theorem:
Theorem (√
2 is irrational)√
2 is irrational, that is, there are no p, q ∈ Z, such that pq =√
2.
Sander Bruggink Automaten und Formale Sprachen 59
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
Induction
Induction is a proof method, which can be used for sets which havesmallest elements (formally: well-founded sets).For example: natural numbers (N)
When we want to show, that all elements of such a set have a certainproperty, we can do this as follows:
Base case: Prove, that all smallest elements of the set have theproperty (in the case of N: 0 or 1).
Induction case: Prove, that an arbitrary element e (which is not oneof the smallest elements) has the property, under the assumption thatall smaller elements have the property.
When we have proven these two parts, we can deduce, that the propertyholds for all elements of the set.
Sander Bruggink Automaten und Formale Sprachen 60
Organisational Stuff and Introduction Mathematical Foundations and Formal Proofs
Example for Induction
Prove the following theorem:
Theorem:
For all n > 0 it holds, that 1 + 2 + · · ·+ n =n · (n + 1)
2.
Sander Bruggink Automaten und Formale Sprachen 61
Words, Grammars and the Chomsky Hierarchy Languages and Grammars
Words
Alphabet
An Alphabet is a finite set.
Word
A word is a finite string of symbols from Σ.
Set set of all words over Σ is denoted by Σ∗.
The empty word (the word of length 0) is denoted by ε.
The set of all non-empty words over Σ is denoted by Σ+.
Sander Bruggink Automaten und Formale Sprachen 62
Words, Grammars and the Chomsky Hierarchy Languages and Grammars
Languages
Language
Let Σ be an alphabet. A (formal) Language K over Σ is a set of wordsover Σ.
That is: L ⊆ Σ∗.
Sander Bruggink Automaten und Formale Sprachen 63
Words, Grammars and the Chomsky Hierarchy Languages and Grammars
Example languages
Alphabets and languages:
Σ1 = {(, ),+,−, ∗, /, a}L1 = {w ∈ Σ∗1 | w is an arithmetical expression}
Σ2 = {a, . . . , z, a, u, o, ß, ., ,, :, . . .}L2 = Grammatically correct sentences of german
Σ3 = arbitraryL3 = ∅, L′3 = {ε}typical languages over the alphabet Σ4 = {a, b}:
L4 = {w ∈ Σ∗4 | w contains aba as subword}L5 = {anbn | n ∈ N}L6 = {anbncn | n ∈ N}
(where xn = x . . . x︸ ︷︷ ︸n×
)
Sander Bruggink Automaten und Formale Sprachen 64
Words, Grammars and the Chomsky Hierarchy Languages and Grammars
Grammars (introduction)
Languages are, in general, infinite: they may contain infinitely many words.We need finite representations ⇒ Grammars
Grammars for naturallanguages
Grammars in computerscience
A means of representingall syntactically correctsentences
Finitely many ruleswhich generate all wordsin the language
For example: Σ = {der, die, das, kleine, bissige, große,Hund,Katze, jagt}.
Sander Bruggink Automaten und Formale Sprachen 65
Words, Grammars and the Chomsky Hierarchy Languages and Grammars
Grammars (introduction)
〈Satz〉 → 〈Subjekt〉〈Pradikat〉〈Objekt〉〈Subjekt〉 → 〈Artikel〉〈Attribut〉〈Substantiv〉〈Artikel〉 → ε〈Artikel〉 → der〈Artikel〉 → die〈Artikel〉 → das〈Attribut〉 → ε〈Attribut〉 → 〈Adjektiv〉〈Attribut〉 → 〈Adjektiv〉〈Attribut〉〈Adjektiv〉 → kleine〈Adjektiv〉 → bissige〈Adjektiv〉 → große〈Substantiv〉 → Hund〈Substantiv〉 → Katze〈Pradikat〉 → jagt〈Objekt〉 → 〈Artikel〉〈Attribut〉〈Substantiv〉
Sander Bruggink Automaten und Formale Sprachen 66
Words, Grammars and the Chomsky Hierarchy Languages and Grammars
Grammars (introduction)
〈Artikel〉 〈Attr.〉 〈Subst.〉
〈Satz〉
〈Pradikat〉 〈Objekt〉
〈Artikel〉 〈Attr.〉 〈Subst.〉
〈Adj.〉 〈Attr.〉
〈Adj.〉
jagt
〈Adj.〉
die große Katzeder kleine bissige Hund
〈Subjekt〉
Sander Bruggink Automaten und Formale Sprachen 67
Words, Grammars and the Chomsky Hierarchy Languages and Grammars
Grammars (definition)
Grammars consist of rules of the form
linke Seite → rechte Seite
Two types of symbol can occur (both in the left side as in the right side).
Non-terminal (the variables, from which wort components are derived)
Terminals (symbols which occur in the actual words)
Sander Bruggink Automaten und Formale Sprachen 68
Words, Grammars and the Chomsky Hierarchy Languages and Grammars
Grammars (definition)
Definition (Grammar)
A grammar G is a 4-tuple G = (V ,Σ,P,S), such that the followingconditions hold:
V is a finite set of non-terminals (or variables)
Σ is the finite alphabet (the set of terminals). (The following musthold: V ∩ Σ = ∅, that is, no symbol is both terminal andnon-terminal)
P is a finite set of rules (also called productions) whereP ⊆ (V ∪ Σ)+ × (V ∪ Σ)∗.
S ∈ V is the start variable.
Sander Bruggink Automaten und Formale Sprachen 69
Words, Grammars and the Chomsky Hierarchy Languages and Grammars
Grammars (definition)
What do productions look like?
P ⊆ (V ∪ Σ)+ × (V ∪ Σ)∗
A production from P is a pair (l , r) of words over V ∪ Σ. A production isusually written l → r .
Both l and r consist of variables and terminal symbols.
l cannot be empty (a rule must always replace a symbol).
Words from (Σ ∪ V )∗ are also called sentence forms.
Sander Bruggink Automaten und Formale Sprachen 70
Words, Grammars and the Chomsky Hierarchy Languages and Grammars
Grammars (definition)
Conventions:
Variables: A, B, C , . . . , S , T , . . .
Terminal symbols: a, b, c , . . . und 0, 1, . . .
Words from (V ∪ Σ)∗ (or Σ∗): u, v , w , x , y , z , . . .
Notation:
The concatenation of two words u, v is denoted uv .
Sander Bruggink Automaten und Formale Sprachen 71
Words, Grammars and the Chomsky Hierarchy Languages and Grammars
Grammars (example)
Example grammar
G = (V ,Σ,P, S) mit
V = {S ,B,C}Σ = {a, b, c}P = {S → aSBC ,S → aBC ,CB → BC , aB → ab,bB → bb, bC → bc, cC → cc}
Sander Bruggink Automaten und Formale Sprachen 72
Words, Grammars and the Chomsky Hierarchy Languages and Grammars
Grammars (derivations)
How are the productions use to generate words from the start variable S?
Idea: When the grammar contains a production l → r , we may replace l byr .
Example:Production: CB → BCDerivation step: aab︸︷︷︸
x
CB︸︷︷︸l
Bcca︸︷︷︸y
⇒ aab︸︷︷︸x
BC︸︷︷︸r
Bcca︸︷︷︸y
.
Sander Bruggink Automaten und Formale Sprachen 73
Words, Grammars and the Chomsky Hierarchy Languages and Grammars
Grammars (derivations)
How do we use productions to derive words from the start variable S?
Definition (Ableitung)
Let G = (V ,Σ,P,S) be a grammar and u, v ∈ (V ∪ Σ)∗ be words. Itholds that
u ⇒G v (u geht unter G unmittelbar uber in v),
when u, v have the following form:
u = xly und v = xry ,
where x , y ∈ (V ∪ Σ)∗ and l → r is a rule in P.
Sander Bruggink Automaten und Formale Sprachen 74
Words, Grammars and the Chomsky Hierarchy Languages and Grammars
Grammars (derivations)
Derivation
A sequence of words w0,w1,w2, . . . ,wn with w0 = S and
w0 ⇒G w1 ⇒G w2 ⇒G · · · ⇒G wn
is a derivation of wn (from S).The wi can contain both terminals and non-terminals. Such words are alsocalled sentence form.
In this case, we also write w0 ⇒∗G wn.
Sander Bruggink Automaten und Formale Sprachen 75
Words, Grammars and the Chomsky Hierarchy Languages and Grammars
Grammars and languages
Language generated by a grammar
The language generated by a grammar G = (V ,Σ,S ,P) is:
L(G ) = {w ∈ Σ∗ | S ⇒∗G w}.
In other words:The language generated by G consists of those words, that can be derivedfrom the start variable S in one ore more derivation steps, and consist onlyof terminal symbols.
Sander Bruggink Automaten und Formale Sprachen 76
Words, Grammars and the Chomsky Hierarchy Languages and Grammars
Which language does the example grammar generate?
Example Grammar
G = (V ,Σ,P, S) with
V = {S ,B,C}Σ = {a, b, c}P consists of:
S → aSBC aB → ab bB → bb CB → BC
S → aBC bC → bc cC → cc
The above example grammar G generates the language
L(G ) = {anbncn | n ≥ 1}.
Here, an = a . . . a︸ ︷︷ ︸n times
.
Sander Bruggink Automaten und Formale Sprachen 77
Words, Grammars and the Chomsky Hierarchy Languages and Grammars
Grammars and languages
Comment:Deriving is no deterministic process, but a non-deterministic one. For aword u ∈ (V ∪ Σ)∗ it is possible, that the are no, or more v with u ⇒G v .
In other words: ⇒G is not a function.
This non-determinism can be caused by two things.
Sander Bruggink Automaten und Formale Sprachen 78
Words, Grammars and the Chomsky Hierarchy Languages and Grammars
Grammars and Languages
Two different rules can be applied:
In the example grammar:
S
aSBC
aBC
aaSBCBC
aaaBCBCBC
aaSBBC C
A rule can be applied in two different places.
In the example grammar:
aaaSBCBCBC
aaaSBBCCBC
aaaSBCBBCC
Sander Bruggink Automaten und Formale Sprachen 79
Words, Grammars and the Chomsky Hierarchy Languages and Grammars
Grammars and languages
Further comments:
Derivations can be arbitrarily long and never reach a word whichconsist only of terminal symbols:
S ⇒ aSBC ⇒ aaSBCBC ⇒ aaaSBCBCBC ⇒ . . .
Sometimes derivation can end in a deadloch, in which no rule can beapplied, but the word still contain non-terminal symbols:
S ⇒ aSBC ⇒ aaBCBC ⇒ aabCBC ⇒ aabcBC 6⇒
A word is generated by a grammar, when there is at least onederivation of the word from the start variable.
Sander Bruggink Automaten und Formale Sprachen 80
Words, Grammars and the Chomsky Hierarchy Languages and Grammars
Backus-Naur-Form
We will use the following short-hand notation for grammars:
When there are rules
A→ w1
...
A→ wn
we also writeA→ w1 | · · · | wn
Sander Bruggink Automaten und Formale Sprachen 81
Words, Grammars and the Chomsky Hierarchy Languages and Grammars
Adventurous grammars
Σ =
{, , , , , ,
}
The Door Rule
You can only go through a door, when you found a key before. (This keycan be used arbitrarily often.)
G1 =({K ,N,X},Σ,P1,N}
), where P1 consists of the following
productions:
N → XN | K | ε
K → XK | K | K | ε
X →∣∣ ∣∣ ∣∣ ∣∣
Sander Bruggink Automaten und Formale Sprachen 82
Words, Grammars and the Chomsky Hierarchy Languages and Grammars
Adventurous grammars
Σ =
{, , , , , ,
}
New Door Rule (Level 2)
The keys are magical and disappear immediately after being used to opena door. As soon as you go through a door, the door is locked again.
G2 =({S ,X},Σ,P2, S}
), where P2 consists of the following productions:
S → XS | S | S | SS | ε
X →∣∣ ∣∣ ∣∣ ∣∣
Sander Bruggink Automaten und Formale Sprachen 83
Words, Grammars and the Chomsky Hierarchy Languages and Grammars
Adventurous grammars
G1: Grammar for the Door Rule:
N → XN | K | ε
K → XK | K | K | ε
X →∣∣ ∣∣ ∣∣ ∣∣
G2: Grammar for the New Door Rule:
S → XS | S | S | SS | ε
X →∣∣ ∣∣ ∣∣ ∣∣
Why is G2 more complex than G1? ⇒ Chomsky-Hierarchy
Sander Bruggink Automaten und Formale Sprachen 84
Words, Grammars and the Chomsky Hierarchy The Chomsky Hierarchy
Chomsky Hierarchy
We classify grammars after the form of their rules:
Chomsky Hierarchy for grammars
Chomsky Type 0: Every grammar is of type 0. (There are noconstraints.)
Chomsky Type 1: For all rules l → r it holds, that |l | ≤ |r |.Chomsky Type 2: Additionally, it holds for all rules l → r , thatl ∈ V .(That is l is a single variable.)
Chomsky Type 3: Additionally, it holds for all rules l → r , thatr = a or r = aB, for a ∈ Σ and B ∈ V .
Sander Bruggink Automaten und Formale Sprachen 85
Words, Grammars and the Chomsky Hierarchy The Chomsky Hierarchy
Special rule for ε
Special rule for ε (For Type 1, Type 2 und Type 3 grammars)
When S is the start symbol, S → ε may occur, when S does not occuranywhere on the right side of a rule.
Sander Bruggink Automaten und Formale Sprachen 86
Words, Grammars and the Chomsky Hierarchy The Chomsky Hierarchy
Chomsky Hierarchy
Grammars
Type 0
Type 1
Type 2
Type 3
no constraints
|l | ≤ |r |
l ∈ V
r = a oder r = aB
Sander Bruggink Automaten und Formale Sprachen 87
Words, Grammars and the Chomsky Hierarchy The Chomsky Hierarchy
Chomsky Hierarchy
Names of grammar classes
Typ 0: . . .
Typ 1: context sensitive grammars, monotonous grammars
Typ 2: context free grammars
Typ 3: regular grammars
Sander Bruggink Automaten und Formale Sprachen 88
Words, Grammars and the Chomsky Hierarchy The Chomsky Hierarchy
Chomsky Hierarchy
Chomsky Hierarchy for languages
A language L ⊆ Σ∗ is of Type i (i ∈ {0, 1, 2, 3}), when there is a Type igrammar G with L(G ) = L (that is, L is generated by G )
Names of language classes
Type 0: semi-decidable languages, recursively enumerable languages
Type 1: context sensitive languages
Type 2: context free languages, algebraic languages
Type 3: regular languages
Sander Bruggink Automaten und Formale Sprachen 89
Words, Grammars and the Chomsky Hierarchy The Chomsky Hierarchy
Chomsky Hierarchy
Grammars Languages
All languages
Type 0 Type 0
Type 1 Type 1
Type 2 Type 2
Type 3 Type 3
Sander Bruggink Automaten und Formale Sprachen 90
Words, Grammars and the Chomsky Hierarchy The Chomsky Hierarchy
Chomsky Hierarchy
Grammars Languages
All languages
Type 0 Type 0
Type 1 Type 1
Type 2 Type 2
Type 3 Type 3
Sander Bruggink Automaten und Formale Sprachen 90
Words, Grammars and the Chomsky Hierarchy The Chomsky Hierarchy
Chomsky Type of a Grammar and Language
Context free grammarG2
S → X | εX → aXa | aa
Regular grammarG1
S → aX | εY → aXX → aY | a
??? language
L(G1) = {an | n ist gerade} = L(G2)
Context free language
L(G1) = {an | n ist gerade} = L(G2)
Regular language
L(G1) = {an | n ist gerade} = L(G2)
Sander Bruggink Automaten und Formale Sprachen 91
Words, Grammars and the Chomsky Hierarchy Word Problem for Context Sensitive Languages
Word Problem
Word Problem
Let a grammar G (of arbitrary type) an word w ∈ Σ∗ be given. Decide,whether w ∈ L(G ).
Decidability of the word problem (Theorem)
The word problem is decidable for type 1 grammars (and as such also forregular and context free grammars). That is: there is a procedure thatdecides, whether w ∈ L(G ).
Sander Bruggink Automaten und Formale Sprachen 92
Words, Grammars and the Chomsky Hierarchy Word Problem for Context Sensitive Languages
Word Problem for Type 1 Languages
Algorithm to solve the word problem for Type 1 Grammars:returns “true” if and only if w ∈ L(G ).
input (G ,w)T := {S}repeat
T ′ := TT := T ′ ∪ {u | |u| ≤ |w | and u′ ⇒ u, for some u′ ∈ T ′}
until w ∈ T or T = T ′
return w ∈ T
Sander Bruggink Automaten und Formale Sprachen 93
Regular Languages
A============================================================================
Sander Bruggink Automaten und Formale Sprachen 94
Regular Languages
Regular Languages
We concern ourselves with regular languages for a few weeks.
deterministic and non-deterministic finite automata
regular expressions
proving, that a language is not regular: Pumping Lemma
minimal automata and acceptance equivalence
closure properties and decision procedure
Sander Bruggink Automaten und Formale Sprachen 94
Regular Languages Finite Automata
Finite Automata
In this part we concern ourselves with regular languages, but first from adifferent viewpoint. Instead of Type 3 grammars we consider state basedautomaton models, which can be view as “language acceptors”.
z1 z2
a a
b
b
Sander Bruggink Automaten und Formale Sprachen 95
Regular Languages Finite Automata
Deterministic Finite Automata
Graphical notation:
State: z Initial state: z0 Final state: zE
Transition: z1 z2a
Example: (Σ = {a, b})
z1 z2
a a
b
b
Sander Bruggink Automaten und Formale Sprachen 96
Regular Languages Finite Automata
Run of a Deterministic Finite Automaton
b a a b a a bInput:
Accepted
z1 z2
z1 z2
a a
b
b
Sander Bruggink Automaten und Formale Sprachen 97
Regular Languages Finite Automata
Run of a Deterministic Finite Automaton
b a a b a a bInput:
Accepted
z1 z2
z1 z2
a a
b
b
Sander Bruggink Automaten und Formale Sprachen 97
Regular Languages Finite Automata
Run of a Deterministic Finite Automaton
b a a b a a bInput:
Accepted
z1 z2
z1 z2
a a
b
b
Sander Bruggink Automaten und Formale Sprachen 97
Regular Languages Finite Automata
Run of a Deterministic Finite Automaton
b a a b a a bInput:
Accepted
z1 z2
z1 z2
a a
b
b
Sander Bruggink Automaten und Formale Sprachen 97
Regular Languages Finite Automata
Run of a Deterministic Finite Automaton
b a a b a a bInput:
Accepted
z1 z2
z1 z2
a a
b
b
Sander Bruggink Automaten und Formale Sprachen 97
Regular Languages Finite Automata
Run of a Deterministic Finite Automaton
b a a b a a bInput:
Accepted
z1 z2
z1 z2
a a
b
b
Sander Bruggink Automaten und Formale Sprachen 97
Regular Languages Finite Automata
Run of a Deterministic Finite Automaton
b a a b a a bInput:
Accepted
z1 z2
z1 z2
a a
b
b
Sander Bruggink Automaten und Formale Sprachen 97
Regular Languages Finite Automata
Run of a Deterministic Finite Automaton
b a a b a a bInput:
Accepted
z1 z2
z1 z2
a a
b
b
Sander Bruggink Automaten und Formale Sprachen 97
Regular Languages Finite Automata
Deterministic Finite Automata
Informal definition:
A deterministic finite automaton (DFA) consists of
states (of which one initial state, some are final states)
a transition function
The following conditions hold:
The alphabet and the state set are finite.
The transition function maps each pair of a state and an alphabetsymbol to exactly one successor state.
A word is accepted by the DFA, when one start at the initial state andreaches a finial state after reading in the word.
Sander Bruggink Automaten und Formale Sprachen 98
Regular Languages Finite Automata
Deterministic Finite Automata
Deterministic Finite Automaton (definition)
A (deterministic) finite automaton (DFA) M is a 5-tupleM = (Z ,Σ, δ, z0,E ), where
Z is the set of states,
Σ is the input alphabet (with Z ∩ Σ = ∅),
z0 ∈ Z is the initial state,
E ⊆ Z is the set of final states and
δ : Z × Σ→ Z is the transition function
Z , Σ must be finite sets.
Sander Bruggink Automaten und Formale Sprachen 99
Regular Languages Finite Automata
Deterministic Finite Automata
The previous transition function δ reads only a single symbol at once.Therefore, we generalize it to a transition function which read in entirewords.
Mehr-Schritt-Ubergange
For a given DFA M = (Z ,Σ, δ, z0,E ) we inductively define the functionδ : Z × Σ∗ → Z as follows:
δ(z , ε) = z
δ(z , ax) = δ(δ(z , a), x)
where z ∈ Z , x ∈ Σ∗ und a ∈ Σ.
Sander Bruggink Automaten und Formale Sprachen 100
Regular Languages Finite Automata
Deterministic Finite Automata
Accepted Language
The language accepted by a DFA M = (Z ,Σ, δ, z0,E ) is
T (M) = {x ∈ Σ∗ | δ(z0, x) ∈ E}.
In other words:The language can be obtain, by enumerating all paths from the initialstate to a final state and concatenating all the symbols on the transitions.
Sander Bruggink Automaten und Formale Sprachen 101
Regular Languages Finite Automata
Deterministic Finite Automata (example)
Let Σ = {a, b}.Which language does the following DFA accept?
z0 z1 z2
a a a, b
b b
Construct a DFA which accepts the following language:
L = {x ∈ Σ∗ | x fangt mit a and und endet mit b}
Sander Bruggink Automaten und Formale Sprachen 102
Regular Languages Finite Automata
Deterministic Finite Automata
DFAs → regular languages (Theorem)
Each language accepted by a DFA is regular.
Idea:States Variables
Transitions Productions
Formally: We construct the grammar G = (V ,Σ,P, S), where V = Z ,S = z0 and P contains the following productions:
If ε ∈ T (M), then P contains a production S → ε.
For all z1 ∈ Z and a ∈ Σ:
If δ(z1, a) = z2, then (z1 → az2) ∈ P.If additionally z2 ∈ E , then (z1 → a) ∈ P.
Sander Bruggink Automaten und Formale Sprachen 103
Regular Languages Finite Automata
Non-deterministic Finite Automata
As opposed to grammars, there are no non-deterministic effects in DFAs.This means that, as soon as the next symbol is read, it is clear which statewill be the next.
But: in many cases, it is more natural to consider non-deterministictransitions. This often leads to smaller and clear automata.
z1
z2
z3
a
a
Sander Bruggink Automaten und Formale Sprachen 104
Regular Languages Finite Automata
Non-deterministic Finite Automata (Idea)
In a non-deterministic finite automaton there are, for each pair (z , a) of astate z and a symbol a, either no, one or more successor states.
z0 z1 z2 zEa b c
a, b, c a, b, c
A non-deterministic automaton (non-deterministically) chooses one fromthe possible successor states. A word is accepted by the automaton, when,starting from a start state, it can reach an end state after reading in theword (if the automaton always chooses “correcly”).
Sander Bruggink Automaten und Formale Sprachen 105
Regular Languages Finite Automata
Run of a Non-deterministic Automaton
a b a b b a bInput:
Accepted
z0
z0
z1
z1
z2
z3
z3
a
b
b
a
a
b
Sander Bruggink Automaten und Formale Sprachen 106
Regular Languages Finite Automata
Run of a Non-deterministic Automaton
a b a b b a bInput:
Accepted
z0
z0
z1
z1
z2
z3
z3
a
b
b
a
a
b
Sander Bruggink Automaten und Formale Sprachen 106
Regular Languages Finite Automata
Run of a Non-deterministic Automaton
a b a b b a bInput:
Accepted
z0
z0
z1
z1
z2
z3
z3
a
b
b
a
a
b
Sander Bruggink Automaten und Formale Sprachen 106
Regular Languages Finite Automata
Run of a Non-deterministic Automaton
a b a b b a bInput:
Accepted
z0
z0
z1
z1
z2
z3
z3
a
b
b
a
a
b
Sander Bruggink Automaten und Formale Sprachen 106
Regular Languages Finite Automata
Run of a Non-deterministic Automaton
a b a b b a bInput:
Accepted
z0
z0
z1
z1
z2
z3
z3
a
b
b
a
a
b
Sander Bruggink Automaten und Formale Sprachen 106
Regular Languages Finite Automata
Run of a Non-deterministic Automaton
a b a b b a bInput:
Accepted
z0
z0
z1
z1
z2
z3
z3
a
b
b
a
a
b
Sander Bruggink Automaten und Formale Sprachen 106
Regular Languages Finite Automata
Run of a Non-deterministic Automaton
a b a b b a bInput:
Accepted
z0
z0
z1
z1
z2
z3
z3
a
b
b
a
a
b
Sander Bruggink Automaten und Formale Sprachen 106
Regular Languages Finite Automata
Run of a Non-deterministic Automaton
a b a b b a bInput:
Accepted
z0
z0
z1
z1
z2
z3
z3
a
b
b
a
a
b
Sander Bruggink Automaten und Formale Sprachen 106
Regular Languages Finite Automata
Run of a Non-deterministic Automaton
a b a b b a bInput:
Accepted
z0
z0
z1
z1
z2
z3
z3
a
b
b
a
a
b
Sander Bruggink Automaten und Formale Sprachen 106
Regular Languages Finite Automata
Non-deterministic Finite Automata
Definition: Non-deterministic Finite Automaton
A non-deterministic finite automaton (NFA) M is a 5-tupleM = (Z ,Σ, δ, S ,E ), where
Z is the set of states,
Σ is input alphabet (with Z ∩ Σ = ∅),
S ⊆ Z is the set of initial states,
E ⊆ Z is the set of final states and
δ : Z × Σ→ P(Z ) is the transition function.
Z , Σ must be finite.
Sander Bruggink Automaten und Formale Sprachen 107
Regular Languages Finite Automata
Non-deterministic Finite Automata
The transition function δ is again extended to a multistep transitionfunction.:
Transition of more steps
Let M = (Z ,Σ, δ,S ,E ) be an NFA. We inductively define the functionδ : P(Z )× Σ∗ → P(Z ) as follows:
δ(Z ′, ε) = Z ′
δ(Z ′, ax) =⋃
z∈Z ′δ(δ(z , a), x)
where Z ′ ⊆ Z , x ∈ Σ∗ and a ∈ Σ.
Sander Bruggink Automaten und Formale Sprachen 108
Regular Languages Finite Automata
Non-deterministic Finite Automata
Accepted Language
The language accepted by a NFA M = (Z ,Σ, δ,S ,E ) is:
T (M) = {x ∈ Σ∗ | δ(S , x) ∩ E 6= ∅}.
In other words: a word w is accepted, when there exists a path from aninitial to a final state, of which the transitions are labelled with the symbolfrom w . (It is possible that there exist more such paths.)
Sander Bruggink Automaten und Formale Sprachen 109
Regular Languages Finite Automata
Non-deterministic Finite Automata
Differences between DFAs and NFAs
DFA: δ(z , a) ∈ ZNFA: δ(z , a) ∈ P(Z )
In a DFA there exists, for each state z and alphabet symbol a, exactlyone succesor state.
In an NFA it is allowed that for some state z and alphabet symbol athere are more than one successor states: δ(z , a) = {z1, z2, . . .}.In an NFA it is allowed that for some state z and alphabet symbol athere are no successor states: δ(z , a) = ∅.
A DFA is a restricted kind of NFA.We only need a small translation: δ(z , a) = z ′ δ(z , a) = {z ′}
Sander Bruggink Automaten und Formale Sprachen 110
Regular Languages Finite Automata
Non-deterministic Finite Automata
Let Σ = {a, b}.Which language does the following NFA accept?
1 2 3 4
a, b
a a, b a, b
We try to find an NFA which accepts the following language:
L = {x | x fangt mit a an und endet auf b}
Sander Bruggink Automaten und Formale Sprachen 111
Regular Languages Finite Automata
From NFAs to DFAs
NFAs → DFAs (Satz)
Every language which is accepted by an NFA is also accepted by a DFA.
Idea: We let the various “parallel universes” be simulated by anautomaton. This automaton memorizes, in which states the NFA can be.
That is, the states of the DFA are sets of states of the original NFA. Thus,this construction is called subset construction.
Sander Bruggink Automaten und Formale Sprachen 112
Regular Languages Finite Automata
Run of a Non-deterministic Finite Automaton
a b a b b a bInput:
Accept
z0
z0
z1
z1
z2
z2
z3
z3
a
b
b
a
a
b
Sander Bruggink Automaten und Formale Sprachen 113
Regular Languages Finite Automata
Run of a Non-deterministic Finite Automaton
a b a b b a bInput:
Accept
z0
z0
z1
z1
z2
z2
z3
z3
a
b
b
a
a
b
Sander Bruggink Automaten und Formale Sprachen 113
Regular Languages Finite Automata
Run of a Non-deterministic Finite Automaton
a b a b b a bInput:
Accept
z0
z0
z1
z1
z2
z2
z3
z3
a
b
b
a
a
b
Sander Bruggink Automaten und Formale Sprachen 113
Regular Languages Finite Automata
Run of a Non-deterministic Finite Automaton
a b a b b a bInput:
Accept
z0
z0
z1
z1
z2
z2
z3
z3
a
b
b
a
a
b
Sander Bruggink Automaten und Formale Sprachen 113
Regular Languages Finite Automata
Run of a Non-deterministic Finite Automaton
a b a b b a bInput:
Accept
z0
z0
z1
z1
z2
z2
z3
z3
a
b
b
a
a
b
Sander Bruggink Automaten und Formale Sprachen 113
Regular Languages Finite Automata
Run of a Non-deterministic Finite Automaton
a b a b b a bInput:
Accept
z0
z0
z1
z1
z2
z2
z3
z3
a
b
b
a
a
b
Sander Bruggink Automaten und Formale Sprachen 113
Regular Languages Finite Automata
Run of a Non-deterministic Finite Automaton
a b a b b a bInput:
Accept
z0
z0
z1
z1
z2
z2
z3
z3
a
b
b
a
a
b
Sander Bruggink Automaten und Formale Sprachen 113
Regular Languages Finite Automata
Run of a Non-deterministic Finite Automaton
a b a b b a bInput:
Accept
z0
z0
z1
z1
z2
z2
z3
z3
a
b
b
a
a
b
Sander Bruggink Automaten und Formale Sprachen 113
Regular Languages Finite Automata
From NFAs to DFAs
Subset construction:
Let an NFA M = (Z ,Σ, δ, S ,E ). be given. We construct a DFAM ′ = (Z,Σ, δ′, z ′0,E ′) with:
Z = P(Z )
δ′(Z ′, a) = δ(Z ′, a), Z ′ ⊆ Z
z ′0 = S
E ′ = {Z ′ ⊆ Z | Z ′ ∩ E 6= ∅}
It holds: T (M) = T (M ′)
Sander Bruggink Automaten und Formale Sprachen 114
Regular Languages Finite Automata
From NFAs to DFAs
Remarks about the subset construction:
Because |P(Z )| = 2|Z | the DFA has exponentially many states (in theworst case). However, sometimes the automaton can be minimized.
However, in many cases the smallest DFA which accepts a language isexponentially larger than the smallest NFA.
Sander Bruggink Automaten und Formale Sprachen 115
Regular Languages Finite Automata
NFAs, DFAs and Regular Grammars
We now can
convert NFAs to DFAs
convert DFAs to regular grammars
The direction regular grammar → NFA fails.
Regular grammar
DFA NFA
Sander Bruggink Automaten und Formale Sprachen 116
Regular Languages Finite Automata
NFAs, DFAs and Regular Grammars
Regular grammars → NFAs (Theorem)
For each regular grammar G there is an NFA M such that L(G ) = T (M).
Construction. Let G = (V ,Σ,P,S) be a regular grammar. We constructthe NFA M = (Z ,Σ, δ,S ′,E ), where
Z = V ∪ {X}, X 6∈ V
S ′ = {S}
E =
{{S ,X} when (S → ε) ∈ P{X} when (S → ε) 6∈ P
B ∈ δ(A, a) when (A→ aB) ∈ P
X ∈ δ(A, a) when (A→ a) ∈ P
It holds that T (M) = L(G ).
Sander Bruggink Automaten und Formale Sprachen 117
Regular Languages Finite Automata
Brief Summary
Now we can transform deterministic finite automata (DFA) to regulargrammars, regular grammars to non-deterministic finite automata (NFA)and NFA to DFA.
Regular Grammar
DFA NFA
Result: Regular grammars and deterministic and non-deterministic finiteautomata describe the same class of languages, namely the class of regularlanguages.
Sander Bruggink Automaten und Formale Sprachen 118
Regular Languages Finite Automata
Small Summary
Advantages and disadvantages of the formalisms:
Regular grammars
provide the connection to the Chomsky hierarchyare used to generate languagesnon-deterministic → less efficient to decide, whether a certain word isin the language
NFAs
allow short, compact representations of languagesintuitive, graphical representationnon-deterministic → less efficient to decide, whether a certain word isin the language
Sander Bruggink Automaten und Formale Sprachen 119
Regular Languages Finite Automata
Small Summary
DFAs
can be exponentially larger than equivalent NFAsallow for an efficient solution for the word problem (we only need tofollow the transitions of the automaton and check whether a final statewas reached)
However, all models require much effort and place to write down. Thus,we need a more compact representation: so-called regular expressions.
Sander Bruggink Automaten und Formale Sprachen 120
Regular Languages Regular Expressions
Regular Expressions
Regular expression
A regular expression is inductively defined as follows:
∅, ε und a (where a ∈ Σ) are regular expressions;
when α and β are regular expressions, then also
(α | β),(αβ),(α∗)
are regular expressions;
everyting which cannot be generated by the above rules is not aregular expression.
Remark: Instead of (α | β) we also often see (α + β).
Sander Bruggink Automaten und Formale Sprachen 121
Regular Languages Regular Expressions
Regular Expressions
Now we have fixed the syntax of regular expressions, we must determinetheir meaning, that it, which regular expression describes which language.
Language of a regular expression
L(∅) = ∅L(ε) = {ε}L(a) = {a}
L(α | β) = L(α) ∪ L(β)
L(αβ) = L(α)L(β), whereL1L2 = {w1w2 | w1 ∈ L1,w2 ∈ L2} fortwo languages L1, L2.
L((α)∗) = (L(α))∗, wobeiL∗ = {w1 . . .wn | n ∈ N0,wi ∈ L} fora language L
Sander Bruggink Automaten und Formale Sprachen 122
Regular Languages Regular Expressions
Regular Expressions
Let Σ = {a, b, c}α1 = (ab | ba)α2 = (ab | ba)∗
α3 = (ab | ba)c∗
α4 = (a | b | c)∗abc(a | b | c)∗
L5 = Language of all words that start with a and end with bb endenL6 = Language of all words that contain an even number of a’s
Sander Bruggink Automaten und Formale Sprachen 123
Regular Languages Regular Expressions
Regular Expressions
Regular grammar
DFA NFA
Regular expression
Sander Bruggink Automaten und Formale Sprachen 124
Regular Languages Regular Expressions
Regular Expression → NFA
Regular expressions → NFAs
For each regular expression γ there exists an NFA M with L(γ) = T (M).
Proof by induction on the structure of γ.
Sander Bruggink Automaten und Formale Sprachen 125
Regular Languages Regular Expressions
Regular Expression → NFAs
Regular expressions → NFAs
For each regular expression γ there exists an NFA M with L(γ) = T (M).
Proof by induction on the structure of γ.
Base step: For γ = ∅, γ = ε and γ = a there are obvious correspondingautomata:
z z z1 z2a
Sander Bruggink Automaten und Formale Sprachen 126
Regular Languages Regular Expressions
Regular Expression → NFA
Induction step
Let γ an arbitrary, composite, regular expression.
This means that γ is of one of the following forms:
γ = α |βγ = αβ
γ = α∗
where α and β are shorter (and therefore smaller) regular expressions sind.By the induction hypothesis we can assume, that there are automata for αand β such that L(α) = T (Mα) und L(β) = T (Mβ).
Sander Bruggink Automaten und Formale Sprachen 127
Regular Languages Regular Expressions
Regular Expression → NFA
Case 1: Let γ = (α | β).We have Mα and Mβ with T (Mα) = L(α)and T (Mβ) = L(β). We construct M withT (M) = L(α | β).
The state set of M is the disjoint unionof both state sets. Also, the set ofinitial states of M is the union of thetwo sets of initial states, and the set offinal states of M is the union of bothsets of final states.
All transitions of Mα and Mβ aremaintained.
Then it holds thatT (M) = T (Mα) ∪ T (Mβ) = L(α) ∪ L(β)
Sα Eα
Sβ Eβ
Mα
Mβ
Sander Bruggink Automaten und Formale Sprachen 128
Regular Languages Regular Expressions
Regular Expression → NFA
Case 2: Let γ = αβ.
Sα Eα Sβ Eβ
a a
a
neu!Mα Mβ
It holds that T (M) = T (Mα)T (Mβ) = L(α)L(β)
Sander Bruggink Automaten und Formale Sprachen 129
Regular Languages Regular Expressions
Regularer Ausdruck → NFA
Fall 2: Sei γ = αβ.There are automata Mα and Mβ with T (Mα) = L(α) and T (Mβ) = L(β).We compose these automata as follows:
The state set of M is the disjoint union of the state sets of Mα andMβ. M has the same initial states as Mα and the same final states asMβ. (When ε ∈ L(α), then the initial states of Mβ are also initialstates of M.)
All transitions of Mα and Mβ are preserved.
All states of Mα that have a transition to a final state of Mα, obtainan additional transition to all initial states of Mβ.
Sander Bruggink Automaten und Formale Sprachen 130
Regular Languages Regular Expressions
Regularer Ausdruck → NFA
Case 3: Let γ = (α)∗.
evtl. zusatzl. Zustand
Sα Eα
aa
a
Mα
It holds T (M) = (T (Mα))∗ = (L(α))∗.
Sander Bruggink Automaten und Formale Sprachen 131
Regular Languages Regular Expressions
Regularer Ausdruck → NFA
Case 3: Let γ = (α)∗.There is an automaton Mα with T (Mα) = L(α). From this automaton weconstruct the automaton M as follows:
All states and all initial and final states are preserved.
Additionally, all states which have an arrow to a final state of Mα
obtain a new transition with the same label to all initial states of Mα.
When ε 6∈ T (Mα), there is an additional state which is both initialand final.
Sander Bruggink Automaten und Formale Sprachen 132
Regular Languages Regular Expressions
Regular Expressions
Regulare Grammatik
DFA NFA
Regularer Ausdruck
Sander Bruggink Automaten und Formale Sprachen 133
Regular Languages Regular Expressions
NFA → Regular Expression
NFAs → Regular expressions
For each NFA M there is a regular expression γ with T (M) = L(γ).
Sander Bruggink Automaten und Formale Sprachen 134
Regular Languages Regular Expressions
NFA → Regular Expression
We use the following state elimination algorithm, that transforms an NFAM into a regular expression. As intermediate steps we obtain automata ofwhich the transitions are labelled with regular expressions instead ofalphabet symbols.
z1 z2α
Sander Bruggink Automaten und Formale Sprachen 135
Regular Languages Regular Expressions
NFA → Regular Expression
Step 1First we add a new initial state and a new final state and connect themwith the old initial and final states with transitions labeled with ε.
ε
ε
ε
ε
......
S E
Sander Bruggink Automaten und Formale Sprachen 136
Regular Languages Regular Expressions
NFA → Regular Expression
Now we non-deterministically use transformation rules, that decrease thesize of the automaton, but make sure that the new automaton still acceptsthe same language.
In the end, only the initial and the final state remain, connected by singlearrow which is labeled with the sought after regular expression.
z1 z2γ
Sander Bruggink Automaten und Formale Sprachen 137
Regular Languages Regular Expressions
NFA → Regular Expression
Rule V: Two parallel arrows with the labels α and β can be fused togetherto a single transition labeled with α | β.
z1 z2
α
β
⇒ z1 z2(α | β)
The same is the case, if a state contain two loops.
zα β ⇒ z (α | β)
Sander Bruggink Automaten und Formale Sprachen 138
Regular Languages Regular Expressions
NFA → Regular Expression
Rule S: Loop are removed by adding their label (augmented with ∗) to thelabels of the following transitions, as follows:
z
x1
xn
...α
β1
βn
⇒ z
x1
xn
...
(α)∗β1
(α)∗βn
This is only allows, if the there is only a single loop on a state.
Sander Bruggink Automaten und Formale Sprachen 139
Regular Languages Regular Expressions
NFA → Regular Expression
Rule E: A state z is eliminated by connecting states with transitions to zand states with transition from z with one another, as follows:
z
x1
xn
y1
ym
......
α1
αn
β1
βm
⇒
x1
xn
y1
ym
......
α1β1
α1βm
αnβ1
αnβm
Sander Bruggink Automaten und Formale Sprachen 140
Regular Languages Regular Expressions
NFA → Regular expression
Rule E may only be applied, if:
there is no loop attached to the removed state z and
the state z has at least one incoming and at least one outgoing edge.
Sander Bruggink Automaten und Formale Sprachen 141
Regular Languages Regular Expressions
NFA → Regular Expression
As soon as no rule can be applied any more, we have in general thefollowing situation (and sometimes some additional dead ends andunreachable states):
z1 z2γ
Then γ is the sought after regular expression.
When there is no transition between the initial and the final state, thenγ = ∅.
Sander Bruggink Automaten und Formale Sprachen 142
Regular Languages Regular Expressions
Regular Expressions
Practical applications of regular expressions:
Search and replace in text editors(Tools: vi, emacs, . . . )
Pattern matching and processing of large databases, for example fordata mining.(Tools: grep, sed, awk, perl, . . . )
Translation of programming languageslexical analysis – Transformation of string of symbols (the program)to a sequence of tokens, where keywords, identifier, data, etc, arealready identified.(Tools: lex, flex, . . . )
Sander Bruggink Automaten und Formale Sprachen 143
Regular Languages Regular Expressions
Regular Expressions in Practice
POSIX ERE-Syntax:
All symbols which do not mean something else, are atomic symbols.For symbols with other meanings: \( = (, \[ = [, usw.
“αβ” “α|β” “α*”
“α?” ≡ (α | ε) “α+” ≡ αα∗ “α{n}” ≡ α . . . α (n Mal α)
“.” ≡ a | · · · | z | 1 | · · · | 9 | % | # | · · ·“[a1 . . . an]” ≡ (a1 | · · · | an)
“[^a1 . . . an]” ≡ jedes Zeichen außer a1 . . . an
(,) group and store (useful for replacing, stored group are denotedwith \1, \2, . . . )
. . .
Sander Bruggink Automaten und Formale Sprachen 144
Regular Languages Regular Expressions
Regular Expressions in Practice
Goal: In a HTML file there are Wiki-links of the form [[text]]. We wantto replace such links by HTML hyperlinks.
In HTML, a hyperlink looks as follows:<a href="A"> T </a>
So we want to replace [[x]] by <a href="x.html">x</a>.
Sander Bruggink Automaten und Formale Sprachen 145
©xkcd.com
Regular Languages Regular Expressions
Summary: Describe Regular Languages
Regulare Grammatik
DFA NFA
Regularer Ausdruck
Arrow →Production
Production →Arrow
Subset con-struction
Inductivealgorithm
State elimi-nation algo-rithm
Sander Bruggink Automaten und Formale Sprachen 147
Regular Languages Regular Expressions
The Pumping Lemma
We now looked at four forma-lisms, with which regular lan-guages can be described: regu-lar grammars, DFAs, NFAs andregular expressions.
Now, we will learn about aproof method to show that acertain language is not regular.
All languages
Regular languages
Regular grammars
DFAs
NFAsRegular expressions
Pumping Lemma
Sander Bruggink Automaten und Formale Sprachen 148
Regular Languages Regular Expressions
The Pumping Lemma
We now looked at four forma-lisms, with which regular lan-guages can be described: regu-lar grammars, DFAs, NFAs andregular expressions.
Now, we will learn about aproof method to show that acertain language is not regular.
All languages
Regular languages
Regular grammars
DFAs
NFAsRegular expressions
Pumping Lemma
Sander Bruggink Automaten und Formale Sprachen 148
Regular Languages Das Pumping-Lemma
The Pigeon Hole Principle
Sander Bruggink Automaten und Formale Sprachen 149
Regular Languages Das Pumping-Lemma
The Pigeon Hole Principle
Sander Bruggink Automaten und Formale Sprachen 149
Regular Languages Das Pumping-Lemma
The Pigeon Hole Principle
Pigeon hole principle)
When you want to distribute m objects over n sets and m > n, then theremust be at least one set which contains two elements.
The pigeon hole principle for finite automata
When an automaton with n states has a path of length m and m ≥ n,then there is at least one state which occurs on the path twice.
Sander Bruggink Automaten und Formale Sprachen 150
Regular Languages Das Pumping-Lemma
The Pumping Lemma
Each path with more transitions as the automaton has states, contains aloop.
zu
v
w
This loop can be traversed multiple times (or not at all). Thus, the worduvw is “pumped”, and one sees that the words uw , uv 2w , uv 3w , . . . arealso in the language of the automaton.
Remark: We write v i = v . . . v︸ ︷︷ ︸i-mal
.
Sander Bruggink Automaten und Formale Sprachen 151
Regular Languages Das Pumping-Lemma
The Pumping Lemma
zu
v
w
We can also assume the following conditions for u, v , w , where n is thenumber of states in the automaton:
1 |v | ≥ 1: the loop is not trivial and contains at least one transition.
2 |uv | ≤ n: the state is reached for the second time after at most ntransitions.
Sander Bruggink Automaten und Formale Sprachen 152
Regular Languages Das Pumping-Lemma
The Pumping Lemma
Example: M = ({z0, z1, z2, z3}, {a, b, c}, δ, z0, {zE})
z0
z1
z2
zE
a
b
b
a
c c
x = a cu
c cv
aw
∈ T (M)
uv 0w = a cu
aw
∈ T (M) uv 2w = a cu
c cv
c cv
aw
∈ T (M) . . .
Sander Bruggink Automaten und Formale Sprachen 153
Regular Languages Das Pumping-Lemma
The Pumping Lemma
From this property of finite automata we derive a property of regularlanguages.
Pumping property for regular languages
A language L has the pumping property when there exists a naturalnumber n such that all words x ∈ L with |x | ≥ n can be decomposed inx = uvw , such that the following conditions hold:
1 |v | ≥ 1,
2 |uv | ≤ n und
3 for all i = 0, 1, 2, . . . it holds that: uv iw ∈ L.
Pumping-Lemma for Regular Languages (Theorem)
When L is a regular language, then L has the pumping property.
Sander Bruggink Automaten und Formale Sprachen 154
Regular Languages Das Pumping-Lemma
The Pumping Lemma
Pumping-Lemma
Let L be a language.
L is regular ⇒∃n ∈ N∀x ∈ L with |x | ≥ n∃uvw such that x = uvw ,|v | ≥ 1, |uv | ≤ n∀i ∈ N
uv iw ∈ L
⇔
∀n ∈ N∃x ∈ L with |x | ≥ n∀uvw such that x = uvw ,|v | ≥ 1, |uv | ≤ n∃i ∈ N
uv iw /∈ L⇒ L is not regular
Sander Bruggink Automaten und Formale Sprachen 155
Regular Languages Das Pumping-Lemma
The Pumping Lemma
Pumping Lemma (alternative formulation)
Let L be a langage. Suppose it is the case thta for each natural number nwe can find a word x with |x | ≥ n, such that for all decompositionx = uvw with
1 |v | ≥ 1,
2 |uv | ≤ n
it holds that there is an i with uv iw 6∈ L. Then L is not regular.
D.h., wir mussen zeigen, dass es fur jedes n (fur jede mogliche Anzahl vonZustanden) ein Wort gibt, das mindestens so lang wie n ist und das keine
”pumpbare“ Zerlegung hat.
Sander Bruggink Automaten und Formale Sprachen 156
Regular Languages Das Pumping-Lemma
The Pumping Lemma
“Cooking recipe” for the pumping lemma
Let L be a language (Example: {akbk | k ≥ 0}). We want to show, that itis not regular.
1 Take an arbitrary number n an. This number cannot be freely chosen.
2 Choose a word x ∈ L with |x | ≥ n. To make sure that the word has aleast length n, n should occur (for example as an exponent) in thedescription of the word.
Example: x = anbn
Sander Bruggink Automaten und Formale Sprachen 157
Regular Languages Das Pumping-Lemma
Pumping Lemma
“Cooking recipe” for the pumping lemma
3 Consider all possible decompositions x = uvw with the followingconstraints: |v | ≥ 1 und |uv | ≤ n.
Example: here there is only one possibility: u = aj , v = al , w = ambn
mit j + l + m = n and l ≥ 1.
4 Choose for each of these decompositions an i (in each case it can bea different i) such that uv iw /∈ L. (In many cases i = 0 and i = 2 aregood choices.)
Example: choose i = 2, then uv 2w = aj+2l+mbn 6∈ L, sincej + 2l + m 6= n.
Sander Bruggink Automaten und Formale Sprachen 158
Regular Languages Das Pumping-Lemma
Pumping Lemma
Example 1
Let Σ = {a, b}.Show that the language {a2k | k ∈ N} is not regular.
Example 2
Let Σ = {a, b, c}.Sho that the language {akb`cm | k ≥ 1, ` ≤ m} is not regular.
Sander Bruggink Automaten und Formale Sprachen 159
Regular Languages Das Pumping-Lemma
Remark to the Pumping Lemma
Faulty application of the Pumping Lemma
When L has the pumping property, then L is regular. 7
There are non-regular languages that satisfy the pumping property.
For example:L = {akbmcm | k ,m ≥ 1} ∪ {bkcm | k,m ≥ 0}.L satisfies the pumping property.
But L is not regular (Proof follows later).
Sander Bruggink Automaten und Formale Sprachen 160
Regular Languages Equivalence Relations and Minimal Automata
Brief Recapitulation: Equivalence Relations
What is an equivlance relation?
We repeat first the definition of a relation:
Relation
A (unary, homogenous) relation R on a set M is a subset R ⊆ M ×M.Instead of (m1,m2) ∈ R we usually write m1 R m2.
Graphical representation:
(a, b) ∈ R a b
Sander Bruggink Automaten und Formale Sprachen 161
Regular Languages Equivalence Relations and Minimal Automata
Brief Recapitulation: Equivalence Relations
Equivalence relation
An equivalence relation R on a set M is a relation R ⊆ M ×M that hasthe following properties:
R is reflexive, that is, (a, a) ∈ R for all a ∈ M.
R is symmetric, that is, if (a, b) ∈ R, then also (b, a) ∈ R.
R is transitive, that is, from (a, b) ∈ R and (b, c) ∈ R it follows that(a, c) ∈ R.
Here, a, b und c are arbitrary elements of M.
Sander Bruggink Automaten und Formale Sprachen 162
Regular Languages Equivalence Relations and Minimal Automata
Brief Recapitulation: Equivalence Relations
Reflexive: a
Symmetric: a b
Transitive: a b c
Sander Bruggink Automaten und Formale Sprachen 163
Regular Languages Equivalence Relations and Minimal Automata
Equivalence Classes
Equivalence class
Let R be an equivalence relation on M and m ∈ M. The equivalence class[m]R of m is the set
[m]R = {n ∈ M | (n,m) ∈ R}
Sometimes one writes only [m], when it is clear, which relation is meant.
Sander Bruggink Automaten und Formale Sprachen 164
Regular Languages Equivalence Relations and Minimal Automata
Equivalence Classes
Properties of equivalence classes
Let R be an equivalence relation on M and m1,m2 ∈ M.Then it holds either that
[m1]R = [m2]R
or that[m1]R ∩ [m2]R = ∅.
Additionally it holds that
M =⋃
m∈M[m]R .
That is, two equivalence classes are either equal or disjoint. Additionallythey completely cover M.It is also said: the equivalence classes build a partition of M.
Sander Bruggink Automaten und Formale Sprachen 165
Regular Languages Equivalence Relations and Minimal Automata
Acceptance Equivalence
1
2
3
4
5
6
b
aa
b
a b
b
a
b
a
a, b
Is this the smallest automaton which accepts the same language?
Sander Bruggink Automaten und Formale Sprachen 166
Regular Languages Equivalence Relations and Minimal Automata
Acceptance Equivalence
The states 4 und 5 are acceptance equivalent. The same holds for thestates 2 and 3. These states can be merged:
1 2/3 4/5 6
a, b
a
b
b
aa, b
Sander Bruggink Automaten und Formale Sprachen 167
Regular Languages Equivalence Relations and Minimal Automata
Acceptance Equivalence
Acceptance equivalence (Definition)
Let a DFA M = (Z ,Σ, δ, z0,E ) be given. Two states z1, z2 ∈ Z areacceptance equivalent, when it holds for all words w ∈ Σ∗, that:
δ(z1,w) ∈ E ⇐⇒ δ(z2,w) ∈ E .
Acceptance equivalent states can be merged ⇒ minimal automaton
Sander Bruggink Automaten und Formale Sprachen 168
Regular Languages Equivalence Relations and Minimal Automata
Minimal Automata
Algorithm minimal automaton
Eingabe: DFA MAusgabe: Sets of acceptance equivalent states
1 Remove the states which are not reachable from the start state.
2 Create a table of all (unordered) state pairs {z , z ′} with z 6= z ′.3 Mark all pairs {z , z ′} with z ∈ E and z ′ 6∈ E (or vice versa).
(z , z ′ are for sure not acceptance equivalent.)
Sander Bruggink Automaten und Formale Sprachen 169
Regular Languages Equivalence Relations and Minimal Automata
Minimal Automata
Algorithm minimal automaton
4 For each unmarked pair {z , z ′} and each a ∈ Σ, test whether{δ(z , a), δ(z ′, a)} is already marked. When yes: mark {z , z ′}.(From z , z ′ there are transitions to non acceptance equivalent states,so they cannot be acceptance equivalent.)
5 Repeat the previous step until no changes in the table are possible.
6 For all pairs {z , z ′} which are still unmarked, it holds that z and z ′
are acceptance equivalent.
Sander Bruggink Automaten und Formale Sprachen 170
Regular Languages Equivalence Relations and Minimal Automata
Minimal Automata
Executing the minimal automaton algorithm on the following automaton:
1
2
3
4
5
6
b
aa
b
a b
b
a
b
a
a, b
23456
1 2 3 4 5
Create a table of all pairs of states
Sander Bruggink Automaten und Formale Sprachen 171
Regular Languages Equivalence Relations and Minimal Automata
Minimal Automata
Executing the minimal automaton algorithm on the following automaton:
1
2
3
4
5
6
b
aa
b
a b
b
a
b
a
a, b
23456 1 1 1 1 1
1 2 3 4 5
(1) Mark pairs of final and non-final states
Sander Bruggink Automaten und Formale Sprachen 171
Regular Languages Equivalence Relations and Minimal Automata
Minimal Automata
Executing the minimal automaton algorithm on the following automaton:
1
2
3
4
5
6
b
aa
b
a b
b
a
b
a
a, b
234 256 1 1 1 1 1
1 2 3 4 5
(2) Mark {2, 4} because δ(2, a) = 1, δ(4, a) = 6 and {1, 6} is marked
Sander Bruggink Automaten und Formale Sprachen 171
Regular Languages Equivalence Relations and Minimal Automata
Minimal Automata
Executing the minimal automaton algorithm on the following automaton:
1
2
3
4
5
6
b
aa
b
a b
b
a
b
a
a, b
234 25 36 1 1 1 1 1
1 2 3 4 5
(3) Mark {3, 5} because δ(3, a) = 1, δ(5, a) = 6 and {1, 6} is marked
Sander Bruggink Automaten und Formale Sprachen 171
Regular Languages Equivalence Relations and Minimal Automata
Minimal Automata
Executing the minimal automaton algorithm on the following automaton:
1
2
3
4
5
6
b
aa
b
a b
b
a
b
a
a, b
234 25 4 36 1 1 1 1 1
1 2 3 4 5
(4) Mark {2, 5} because δ(2, a) = 1, δ(5, a) = 6 and {1, 6} is marked
Sander Bruggink Automaten und Formale Sprachen 171
Regular Languages Equivalence Relations and Minimal Automata
Minimal Automata
Executing the minimal automaton algorithm on the following automaton:
1
2
3
4
5
6
b
aa
b
a b
b
a
b
a
a, b
234 2 55 4 36 1 1 1 1 1
1 2 3 4 5
(5) Mark {3, 4} because δ(3, a) = 1, δ(4, a) = 6 and {1, 6} is marked
Sander Bruggink Automaten und Formale Sprachen 171
Regular Languages Equivalence Relations and Minimal Automata
Minimal Automata
Executing the minimal automaton algorithm on the following automaton:
1
2
3
4
5
6
b
aa
b
a b
b
a
b
a
a, b
234 2 55 6 4 36 1 1 1 1 1
1 2 3 4 5
(6) Mark {1, 5} because δ(1, a) = 3, δ(5, a) = 6 and {3, 6} is marked
Sander Bruggink Automaten und Formale Sprachen 171
Regular Languages Equivalence Relations and Minimal Automata
Minimal Automata
Executing the minimal automaton algorithm on the following automaton:
1
2
3
4
5
6
b
aa
b
a b
b
a
b
a
a, b
234 7 2 55 6 4 36 1 1 1 1 1
1 2 3 4 5
(7) Mark {1, 4} because δ(1, a) = 3, δ(4, a) = 6 and {3, 6} is marked
Sander Bruggink Automaten und Formale Sprachen 171
Regular Languages Equivalence Relations and Minimal Automata
Minimal Automata
Executing the minimal automaton algorithm on the following automaton:
1
2
3
4
5
6
b
aa
b
a b
b
a
b
a
a, b
23 84 7 2 55 6 4 36 1 1 1 1 1
1 2 3 4 5
(8) Mark {1, 3} because δ(1, b) = 2, δ(3, b) = 5 and {2, 5} is marked
Sander Bruggink Automaten und Formale Sprachen 171
Regular Languages Equivalence Relations and Minimal Automata
Minimal Automata
Executing the minimal automaton algorithm on the following automaton:
1
2
3
4
5
6
b
aa
b
a b
b
a
b
a
a, b
2 93 84 7 2 55 6 4 36 1 1 1 1 1
1 2 3 4 5
(9) Mar {1, 2} because δ(1, b) = 2, δ(2, b) = 4 and {2, 4} is marked
Sander Bruggink Automaten und Formale Sprachen 171
Regular Languages Equivalence Relations and Minimal Automata
Minimal Automata
Executing the minimal automaton algorithm on the following automaton:
1
2
3
4
5
6
b
aa
b
a b
b
a
b
a
a, b
2 93 84 7 2 55 6 4 36 1 1 1 1 1
1 2 3 4 5
The remaining state pairs {2, 3} und {4, 5} cannot be marked they areacceptance equivalent
Sander Bruggink Automaten und Formale Sprachen 171
Regular Languages Equivalence Relations and Minimal Automata
Minimal Automata
Hints for the minimization algorithm:
Create the table in such a way, that each pair of states occurs exactlyonce.
2, . . . , n vertically and 1, . . . , n − 1 horizontally.
Please specify which pairs of states are marked in which order!
(In Schoning’s book only asterisks (∗) are used, but from that theorder and reason for the markings cannot be derived during thecorrection.)
Sander Bruggink Automaten und Formale Sprachen 172
Regular Languages Equivalence Relations and Minimal Automata
Myhill–Nerode Equivalence
Acceptance equivalence (Recapitulation)
Let a DFA M = (Z ,Σ, δ, z0,E ) be given. Two states z1, z2 ∈ Z areacceptance equivalent, when it holds for all words w ∈ Σ∗, that:
δ(z1,w) ∈ E ⇐⇒ δ(z2,w) ∈ E .
When we expand the acceptance equivalence to words (instead of states),we obtain the Myhill–Nerode equivalence.
Myhill–Nerode equivalenz (Definition)
Let a language L and words x , y ∈ Σ∗ be given.We define an equivalence relation ≡L by: x ≡L y if and only if
for all z ∈ Σ∗ it holds that (xz ∈ L ⇐⇒ yz ∈ L).
Sander Bruggink Automaten und Formale Sprachen 173
Regular Languages Equivalence Relations and Minimal Automata
Myhill–Nerode Equivalence
Let L = {akbk | k ∈ N}. Is it the case that:
a4b3 ≡L a3b2 ?
a2b2 ≡L a3b2 ?
a4b2 ≡L a3b2 ?
abb ≡L baba ?
What are the Myhill-Nerode equivalence classes of the following languages?
L1 = {w ∈ {a, b}∗ | #a(w) even}L2 = {w ∈ {a, b, c}∗ | the subword abc does not occur in w}
Sander Bruggink Automaten und Formale Sprachen 174
Regular Languages Equivalence Relations and Minimal Automata
Myhill–Nerode Equivalence
Myhill–Nerode equivalence and regularity (Theorem)
A language L ⊆ Σ∗ is regular, if and only if ≡L has finitely manyequivalence classes.
Sander Bruggink Automaten und Formale Sprachen 175
Regular Languages Equivalence Relations and Minimal Automata
Myhill–Nerode Equivalence
L is regular ⇒ ≡L has finitely many equivalence classes:
Let L be a language and M = (Z ,Σ, δ, z0,E ) a DFA with T (M) = L. Wedefine the equivalence relation ≡M with
x ≡M y ⇐⇒ δ(z0, x) = δ(z0, y) for x , y ∈ Σ∗.
The number of equivalence classes of ≡M is equal to the number of(reachable) states of M, that is, finite.
Sander Bruggink Automaten und Formale Sprachen 176
Regular Languages Equivalence Relations and Minimal Automata
Myhill–Nerode Equivalence
We can show, that x ≡M y implies x ≡L y folgt. Assume x ≡M y and takean arbitrary z ∈ Σ∗. It holds that:
xz ∈ L ⇐⇒ δ(z0, xz) ∈ E (Def. acc. language)
⇐⇒ δ(δ(z0, x), z) ∈ E (Def. δ)
⇐⇒ δ(δ(z0, y), z) ∈ E x RM y
⇐⇒ δ(z0, yz) ∈ E (Def. δ)
⇐⇒ yz ∈ L. (Def. acc. language)
From this, it follows that x ≡L y .
Therefore ≡M connects at most as many words as ≡L and thus has moreequivalence classes as ≡L. From this it follows, that ≡L has finitely manyequivalence classes.
Sander Bruggink Automaten und Formale Sprachen 177
Regular Languages Equivalence Relations and Minimal Automata
Myhill–Nerode Equivalence
≡L has finitely many equivalence classes ⇒ L is regular:
Assume that ≡L has finitely many equivalence classes. We construct thefinite automaton M0 = (Z ,Σ, δ, z0,E ) for L, which is defined as follows:
Z = {[w ]≡L| w ∈ Σ∗} (set of equivalence classes)
z0 = [ε]≡L
E = {[w ]≡L| w ∈ L}
δ([w ]≡L, a) = [wa]≡L
Sander Bruggink Automaten und Formale Sprachen 178
Regular Languages Equivalence Relations and Minimal Automata
Myhill–Nerode-Equivalence
From δ([w ]≡L, a) = [wa]≡L
it follows that δ([w ]≡L, u) = [wu]≡L
.
It holds that
x ∈ L(M0) ⇐⇒ δ([ε], x) ∈ E (Def. acc. Language)
⇐⇒ [x ] ∈ E (See above)
⇐⇒ x ∈ L (Def. Final states)
Therefore T (M0) = L. �
Sander Bruggink Automaten und Formale Sprachen 179
Regular Languages Equivalence Relations and Minimal Automata
Myhill–Nerode Equivalence
With the Myhill–Nerode theorem we can show that a language is regularand that a language is not regular.
Examples:
The language L1 = {akbk | k ≥ 0} has infinitely many equivalenceclasses and is not regular.
The language L2 = {anbmcm | n,m ≥ 1} ∪ {bmcn | n,m ≥ 1} hasinfinitely many equivalence classes and is not regular.(However, it does satisfy the conditions of the pumping lemma!)
Sander Bruggink Automaten und Formale Sprachen 180
Regular Languages Equivalence Relations and Minimal Automata
Myhill–Nerode Equivalence
Let M0 be the DFA, that is constructed from the equivalence classes.For arbitrary automaton M, with T (M) = T (M0), it holds that
≡M ⊆ ≡L = ≡M0 .
That means that M0 can be constructed from M by mergin acceptanceequivalent states.
In other words: M0 is the minimal DFA for L: all other minimal DFAs thataccept the same language, are equal (after renaming states).
Sander Bruggink Automaten und Formale Sprachen 181
Regular Languages Equivalence Relations and Minimal Automata
Minimal Automata Again
For non-deterministic automata the following holds:
The minimalen NFA does not exist, but there can be more than one.
The following NFAs accept the language L((0|1)∗1) and both havetwo states. (The language cannot be accepted with only a single .)
1 2
0, 1
1
0
1 2
1
0
1
Sander Bruggink Automaten und Formale Sprachen 182
Regular Languages Equivalence Relations and Minimal Automata
Minimal Automata Again
Let a DFA M be given. Then a minimal NFA that accepts T (M)erkennt always has at most as many states as M. (Because M is aalread an NFA itself.)
In fact, a minimal NFA can be exponentially smaller as the minimalDFA.
See for example the languages
Lk = {x ∈ {0, 1}∗ | |x | ≥ k , the k-last symbol x is 0}.
Sander Bruggink Automaten und Formale Sprachen 183
Regular Languages Closure Properties
Closure Properties
Closure properties (Definition)
Let a set M an a n-ary operator f : M × · · · ×M → M be given.We say that a set M ′ ⊆ M is closed under f , when it holds for n arbitraryelements m1, . . . ,mn ∈ M ′ that f (m1, . . . ,mn) ∈ M ′.
Example: Let M = N and M ′ = {i | i ∈ N and i is even}. Let f be themultiplication operator, that is f (x , y) = x · y .M ′ is closed under f , because the product of two even number is even.
Now, let g be defined as: g(x) = x/2, where / is integer division.M ′ is not closed under g , because for example g(6) = 6/2 = 3.
Sander Bruggink Automaten und Formale Sprachen 184
Regular Languages Closure Properties
Closure Properties
Here we consider closure properties of the set of regular languages. Theinteresting questions are:
When L1, L2 are regular, are L1 ∪ L2, L1 ∩ L2, L1L2, L1 = Σ∗\L1
(complement) and L∗1 also regular?
Short answer: The regular languages are closed under all the mentionedoperations.
Sander Bruggink Automaten und Formale Sprachen 185
Regular Languages Closure Properties
Closure Properties
Closure under union
When L1 and L2 are regular, then L1 ∪ L2 is also regular.
Proof: Since L1 and L2 are regular by assumption, there are regularexpressions α1 and α2 such that L(α1) = L1 and L(α2) = L2. The regularexpression α1 | α2 generates the language L1 ∪ L2 and therefore thatlanguage is regular.
Sander Bruggink Automaten und Formale Sprachen 186
Regular Languages Closure Properties
Closure Properties
Closure under concatenation
When L1 and L2 are regular, then L1L2 = {w1w2 | w1 ∈ L1 and w2 ∈ L2}is also regular.
Proof: Since L1 and L2 are regular by assumption, there are regularexpressions α1 and α2 such that L(α1) = L1 and L(α2) = L2. The regularexpression α1α2 generates the language L1 ∩ L2 and therefore thatlanguage is regular.
Sander Bruggink Automaten und Formale Sprachen 187
Regular Languages Closure Properties
Closure Properties
Closure under the star operation
When L is a regular language, then L∗ is also regular.
Proof: Since L is regular by assumption, there is a regular expression αwith L(α) = L. The regular expression α∗ generates the language L∗.Because there is a regular language for this language, it is regular.
Sander Bruggink Automaten und Formale Sprachen 188
Regular Languages Closure Properties
Closure Properties
Closure under complement
When L is a regular language, then also L = Σ∗\L is a regular language.
Remark: when we construct the complement, we must always specify inrelation to which set we construct the complement. In this case this is theset Σ∗, the set of all words over Σ.
Sander Bruggink Automaten und Formale Sprachen 189
Regular Languages Closure Properties
Closure Properties
Proof: Since L is regular by assumption, there is a DFAM = (Z ,Σ, δ, z0,E ) with T (M) = L. This automaton is transformed intoan Automaton M ′ for L, by exchanging final and non-final states. That is:M ′ = (Z ,Σ, δ, z0,Z\E ).
Now we have:w ∈ L ⇐⇒ δ(z0,w) ∈ E ⇐⇒ δ(z0,w) 6∈ Z\E ⇐⇒ w 6∈ L.
Since there is an automaton M ′ for L, L is regular.
Sander Bruggink Automaten und Formale Sprachen 190
Regular Languages Closure Properties
Closure Properties
Closure under complement
z0 z1 z2 zEa
b, c a
b
c a
b
c
a, b, c
z0 z1 z2 zEa
b, c a
b
c a
b
c
a, b, c
Sander Bruggink Automaten und Formale Sprachen 191
Regular Languages Closure Properties
Closure Properties
Closure under intersection
When L1 and L2 are regulare languages, then L1 ∩ L2 is also regular.
Proof: It holds that L1 ∩ L2 = L1 ∪ L2 and we already know, that regularlanguages are closed under union and complement.
Sander Bruggink Automaten und Formale Sprachen 192
Regular Languages Closure Properties
Closure Properties
Cross product contruction for DFAsThere is also a direct construction. In this construction, two automata aresynchronized with each other. This is done by building the cross product ofthe two state sets.
Let M1 = (Z1,Σ, δ1, s1,E1) and M2 = (Z2,Σ, δ2, s2,E2) be DFAs withT (M1) = L1 and T (M2) = L2. The following DFA M accepts thelanguage L1 ∩ L2:
M = (Z1 × Z2,Σ, δ, (s1, s2),E1 × E2),
where δ((z1, z2), a) = (δ1(z1, a), δ2(z2, a)).
M accepts a word w if and only if both M1 and M2 accept it.
Sander Bruggink Automaten und Formale Sprachen 193
Regular Languages Closure Properties
Closure Properties
Cross product contruction for NFAsLet M1 = (Z1,Σ, δ1,S1,E1) and M2 = (Z2,Σ, δ2, S2,E2) be NFAs withT (M1) = L1 and T (M2) = L2. The following NFA M accepts thelanguage L1 ∩ L2:
M = (Z1 × Z2,Σ, δ,S1 × S2,E1 × E2),
where δ((z1, z2), a) = δ1(z1, a)× δ2(z2, a).
M accepts a word w if and only if both M1 and M2 accept it.
Sander Bruggink Automaten und Formale Sprachen 193
Regular Languages Closure Properties
Closure Properties
Why are closure properties interesting?
To show that a language is regular.Complex regular language can be built from simple ones.
To show that a language is not regular. Sometimes it is simpler toprove that the complement of a language or the intersection of alanguage with a regular language is not regular, than to show that alanguage is not regular itself.
Sander Bruggink Automaten und Formale Sprachen 194
Regular Languages Closure Properties
Closure Properties: The Adventure Again
42
3
1
5 6
9
8
7
11
10
12
13
14 15 16
Sander Bruggink Automaten und Formale Sprachen 195
Regular Languages Closure Properties
Closure Properties: The Adventure Again
The Treasure Rule
You must find at least two treasures.
The Door Rule
You can only go through a door, when you have found a key before. (Thiskey can be used arbitrarily often and fits to every door.)
Sander Bruggink Automaten und Formale Sprachen 196
Regular Languages Closure Properties
Closure Properties: The Adventure Again
The Dragon Rule
Immediately after the encounter with a dragon you have to jump into ariver, because otherwise the dragon will set you afire. This is no longernecessary when you’ve found a sword, because you can then kill thedragon.
Alphabet symbols:
Dragon (D):
Sword (W):
River (F):
Arch (B):
Door (T):
Key (L):
Treasure (A):
Sander Bruggink Automaten und Formale Sprachen 197
Regular Languages Closure Properties
Closure Properties: The Adventure Again
The rules can be described by the following (non-deterministic) finiteautomata.
1 2
1 2 3 1 2
Σ
Σ\{ , }
3
D
Σ Σ Σ
T
Σ\{ , } Σ
A
Sander Bruggink Automaten und Formale Sprachen 198
Regular Languages Closure Properties
Closure Properties: The Adventure Again
Let M be the automaton, which describes the adventure map. Let
LM = T (M) be the set of all paths in the map from a start to an endstate,
LA = T (A) be the set of all paths that satisfy the treasure rule,
LT = T (T ) be the set of all paths that satisfy the door rule, and
LD = T (D) be the set of all paths that satisfy the dragon rule.
Let AM be the set of all paths through the adventure map which satisfy allconditions. Then:
AM = LM ∩ LA ∩ LT ∩ LD
Is there a solution to the adventure?
Sander Bruggink Automaten und Formale Sprachen 199
Regular Languages Algorithms for Problems
Algorithms
We now discuss, whether there are procedure or algorithms to solvequestions about regular languages.
The general form of the questions is:
Let regular languages L1, L2 be given. Does it hold for these languages,that . . . ?
We assume, that the regular languages are given as DFAs, NFAs,grammars or regular expressions.
Sander Bruggink Automaten und Formale Sprachen 200
Regular Languages Algorithms for Problems
Algorithms
Problems
Word problem: Let a regular language L and w ∈ Σ∗ be given. Doesw ∈ L hold?
Emptiness problem: Let a regular language L be given. Does L = ∅hold?
Finiteness problem: Let a regular language L be given. Is L finite?
Intersection problem: Let two regular languages L1 and L2 be given.Does L1 ∩ L2 = ∅ hold?
Inclusion problem: Let two regular languages L1 and L2 be given.Does L1 ⊆ L2 hold?
Equivalence problem: Let two regular languages L1 and L2 be given.Does L1 = L2 hold?
Sander Bruggink Automaten und Formale Sprachen 201
Regular Languages Algorithms for Problems
Algorithms
Word problem (w ∈ L?)
Let a regular language L and w ∈ Σ∗ be given.
Solution: Determine a DFA M for L and track the state transitions of Mduring the reading of w .Final state reached w ∈ LNon-final state reached w 6∈ L
Sander Bruggink Automaten und Formale Sprachen 202
Regular Languages Algorithms for Problems
Algorithms
Emptiness problem (L = ∅?)
Let a regular language L be given.
Solution: Determine a NFA M for L.
L = ∅ ⇐⇒ there is no path from an initial to a final state.
Sander Bruggink Automaten und Formale Sprachen 203
Regular Languages Algorithms for Problems
Algorithmen
Finiteness problem (is L finite?)
Let a regular language L be given.
Solution: Determine a NFA M for L.L is finite⇐⇒ there are infinitely paths from an initial to a final state in M⇐⇒ there is a reachable cycle in M from which a final state is reachable
Sander Bruggink Automaten und Formale Sprachen 204
Regular Languages Algorithms for Problems
Algorithms
Intersection problem (L1 ∩ L2 = ∅?)
Let regular languages L1 and L2 be given
Solution: Determine DFAs M1 and M2 for L1 and L2 and construct theircross product. Now apply the emptiness test to the cross product.
Sander Bruggink Automaten und Formale Sprachen 205
Regular Languages Algorithms for Problems
Algorithms
Inclusion problem (L1 ⊆ L2?)
Let regular languages L1, L2 be given.
Solution: L1 ⊆ L2 holds if and only if L1 ∩ L2 = ∅. Since intersection andcomplement can be determined constructively and an emptiness testexists, the inclusion problem can be solved.
Sander Bruggink Automaten und Formale Sprachen 206
Regular Languages Algorithms for Problems
Algorithms
Equivalence problem (L1 = L2?)
Let regular languages L1, L2 be given.
Solution 1: Determine the minimal DFAs for L1 and L2. When the DFAsare equal (possible after renaming of states), then L1 and L2 are equal.
Solution 2: L1 = L2 holds if and only if L1 ⊆ L2 and L1 ⊇ L2. Dasinclusion problem is solvable.
Sander Bruggink Automaten und Formale Sprachen 207
Regular Languages Algorithms for Problems
Algorithms
Efficiency:The complexity of the described procedures depends on the representationof the regular languages.
For the equivalence problem:
L1, L2 given as DFAs complexitity O(n2)(quadratically many steps relative to the size of the input)
L1, L2 given as grammars, regular expressios or NFAs complexityNP-hardThis means among others, that no efficient algorihms to solve theproblem are known.
More to complexity in the lecture “Berechenbarkeit und Komplexitat”(Computability and complexity).
Sander Bruggink Automaten und Formale Sprachen 208
Regular Languages Program Verification with Regular Languages
Application: Model Verification
System Specification
Modelling Modelling
Systemmodel
Specificationmodel
Model checker
finiteautomata
Yes No
Sander Bruggink Automaten und Formale Sprachen 209
Regular Languages Program Verification with Regular Languages
Application: Model Verification
In our case:
The system model is an NFA Sys that accepts all possible systemruns.
The specification model is an NFA Spec that accepts all allowssystem runs.
We verify a safety property. That means, that Spec models the allowedsystem run. We try to find out, whether
L(Sys) ⊆ L(Spec)
Sander Bruggink Automaten und Formale Sprachen 210
Regular Languages Program Verification with Regular Languages
Application: Verification
Example: mutual exclusion
We consider two processes P1, P2, that try to access a sharedresource.
Each process has a so-called critical area, in which it accesses theresource. At each time, only one process may be in its critical area.
The processes may use shared variables, that the process may use tosynchronize. However, these variables are no semaphores, that is,there does not exist an atomic operation which read and write thevariable at the same time.
We want to show that mutual exclusion is ensured.
Sander Bruggink Automaten und Formale Sprachen 211
Regular Languages Program Verification with Regular Languages
Application: Verification
Attempt 1: Both processes P1, P2 use a single shared boolean variable fthat is initialized with false .
Program code for P1, P2
while true do1: if (f = false) then2: f := true3: Enter critical area
. . .4: Leave critical area5: f := false
endend
Sander Bruggink Automaten und Formale Sprachen 212
Regular Languages Program Verification with Regular Languages
Anwendung: Verifikation
Alphabet:
(f := true)1 : P1 sets f to true (f := true)2 : P2 sets f to true
(f := false)1 : P1 sets f to false (f := false)2 : P2 sets f to false
(f = true?)1 : P1 reads f and f=true (f = true?)2 : P2 reads f and f=true
(f = false?)1 : P1 reads f and f=false (f = false?)2 : P2 reads f and f=false
BkB1 : P1 enters CA BkB2 : P2 enters CA
VkB1 : P1 leaves CA VkB2 : P2 leaves CA
Sander Bruggink Automaten und Formale Sprachen 213
Regular Languages Program Verification with Regular Languages
Application: Verification
Vorgang:
Specify automata P1 and P2 for both processes.
Specify an automaton F for the value of the variable.
Calculate the cross product MSys of the above three automata. Thisautomaton models the combined behaviour of the system.
Specify an automaton MSpec for the specification.
Find out, whether L(MSys) ⊆ L(MSpec).
Sander Bruggink Automaten und Formale Sprachen 214
Regular Languages Program Verification with Regular Languages
Anwendung: Verifikation
Modelling the processes:
i : f := truek : . . .
⇒ i k(f := true)1
i : if f = true thenj : . . .
endk : . . .
⇒ j i k(f = true?)1 (f = false?)1
Sander Bruggink Automaten und Formale Sprachen 215
Regular Languages Program Verification with Regular Languages
Application: Verification
Descriptions of the runs of the process i as finite automaton:
1
3
4
2
5
(f = false?)i
VkB i
(f := false)i
(f = true?)i
(f := true)i
BkB i
∆i
∆i ∆i
∆i
∆i
Pi
with ∆i = {(f :=true)j , (f :=false)j , (f =true?)j , (f =false?)jBkB j ,VkB j}where j = 2 when i = 1, und j = 1 when i = 2.
Sander Bruggink Automaten und Formale Sprachen 216
Regular Languages Program Verification with Regular Languages
Application: Verification
Description of the boolean variable f by an automaton:
1
2
(f = false?)1
(f = false?)2
(f := false)1
(f := false)2
(f := true)1
(f := true)2
(f := false)1
(f := false)2
(f = true?)1
(f = true?)2
(f := true)1
(f := true)2
∆f
∆f
F
where ∆f = {BkB1,VkB1,BkB2,VkB2}.
Sander Bruggink Automaten und Formale Sprachen 217
Regular Languages Program Verification with Regular Languages
Application: Verification
The language of all runs of the system is T (P1) ∩ T (P2) ∩ T (F ).
The automaton WA which describes all runs that satisfy mutual exclusion(both process are not in their critical areas at the same time).
2 1 3
BkB1
VkB1
Σ\{BkB1,BkB2}
VkB2
BkB2
WA
Σ\{VkB1,BkB2} Σ\{BkB1,VkB2}
Now we have to show that T (P1) ∩ T (P2) ∩ T (F ) ⊆ T (WA).
Sander Bruggink Automaten und Formale Sprachen 218
Regular Languages Program Verification with Regular Languages
Application: Verification
Encoding for Grail:
(f := true)1 a (f := true)2 A
(f := false)1 b (f := false)2 B
(f = true?)1 c (f = true?)2 C
(f = false?)1 d (f = false?)2 D
BkB1 x BkB2 X
VkB1 y VkB2 Y
Sander Bruggink Automaten und Formale Sprachen 219
Regular Languages Program Verification with Regular Languages
Application: Verification
Automaton files: p1.aut, p2.aut, f.aut, wa.aut.
Used Grail-tools:
fmcross aut1 < aut2 > res – generates the cross product of aut1and aut2 and stores the result in res.
fmcment aut > res – generates the complement of aut and stores theresult in res.
fmenum aut – enumerates the words that are accepted by aut.
Sander Bruggink Automaten und Formale Sprachen 220
Regular Languages Program Verification with Regular Languages
Application: Verification
Attempt 2: We now consider Lamport’s algorithm for mutual exclusion.
Here we consider two processes P1 and P2 with different program codeand two shared boolean variables f1 and f2 (both initialized to be false).
Sander Bruggink Automaten und Formale Sprachen 221
Regular Languages Program Verification with Regular Languages
Application: Verification
Prozess P1:
while true do1: f1 := true2: while (f2 = true?) do
skipend
3: Enter critical area. . .
4: Leave critical area5: f1 := false
end
skip : Null-operation (does nothing)
Sander Bruggink Automaten und Formale Sprachen 222
Regular Languages Program Verification with Regular Languages
Application: Verification
Prozess P2
while true do1: f2 := true2: if (f1 = true?) then do3: f2 := false4: while (f1 = true?) do skip end
else5: Enter critical area
. . .6: Leave critical area7: f2 := false
endend
Sander Bruggink Automaten und Formale Sprachen 223
Regular Languages Program Verification with Regular Languages
Application: Verification
In this case we use the following alphabet Σ:
Σ = {(f1 := false)i , (f2 := false)i , (f1 = false?)i , (f2 = false?)i ,
(f1 := true)i , (f2 := true)i , (f1 = true?)i , (f2 = true?)i ,
BkB i ,VkB i | i ∈ {1, 2}}
Sander Bruggink Automaten und Formale Sprachen 224
Regular Languages Program Verification with Regular Languages
Application: Verification
Automaton for the process P1:
5
2 3
4
∆
∆
1(f1 := true)1
VkB1(f1 := false)1
(f2 = true?)1
(f2 = false?)1
∆
∆∆
BkB1
P1
Where ∆ contains all actions of process P2.
Sander Bruggink Automaten und Formale Sprachen 225
Regular Languages Program Verification with Regular Languages
Application: Verification
Automaton for the process P2:
1 2 3
4
5
(f2 := true)2
(f1 = true?)2
6
(f1 = false?)2
(f1 = true?)2
(f1 = false?)2
(f2 := false)2
∆
∆
∆∆
∆
7
∆
VkB2 BkB2
(f2 := false)2
P2
Where ∆ contains all actions of process P1.
Sander Bruggink Automaten und Formale Sprachen 226
Regular Languages Program Verification with Regular Languages
Application: Verification
Automata for the two variables:
1
2
1
2
(f1 = false?)2 (f1 := false)1
(f1 := true)1 (f1 := false)1
(f1 := true)1(f1 = true?)2
∆1
∆1
∆2
(f2 := false)2
(f2 := false)2(f2 := true)2
(f2 := true)2(f2 = true?)1
∆2
(f2 = false?)1
F1 F2
∆1 = {(f2 := true)2, (f2 := false)2, (f2 = true?)1, (f2 = false?)1
BkB1,BkB2,VkB1,VkB2
Analogously for ∆2.Sander Bruggink Automaten und Formale Sprachen 227
Regular Languages Program Verification with Regular Languages
Anwendung: Verifikation
Encoding for Grail:
(f1 := true)1 a (f2 := true)2 A BkB1 x
(f1 := false)1 b (f2 := false)2 B BkB2 X
(f2 = true?)1 c (f1 = true?)2 C VkB1 y
(f2 = false?)1 d (f1 = false?)2 D VkB2 Y
Now the system runs are contained in the allowed runs. So, the algorithmis correct.
Sander Bruggink Automaten und Formale Sprachen 228
Regular Languages Program Verification with Regular Languages
Application: Verification
Summary Verification:
We have modelled, with the help of finite automata, two protocols,that should realise mutual exclusion.
With the help of the algorithm to solve the language inclusion andintersection problem, we have automatically tested whether or notthese protocols are correct.
That means: automata and regular languages can be used for programverification.
Sander Bruggink Automaten und Formale Sprachen 229
Context Free Languages Context Free Grammars and Syntax Trees
Context Free Languages
Context free languages
Context free grammars and syntax trees
Word problem: the CYK-algorithm
Pumping lemma for context free languages
Automaton model: push-down automata
Closure properties and algorithms
Determinism/non-determinism in context free languages
Sander Bruggink Automaten und Formale Sprachen 230
Context Free Languages Context Free Grammars and Syntax Trees
Context Free Languages
Applications of context free languages
Describing the syntax of programming languages. Many techniquesmentioned in the lecture are interesting for the construction ofcompilers.
Partly also for describing natural languages.
Sander Bruggink Automaten und Formale Sprachen 231
Context Free Languages Context Free Grammars and Syntax Trees
Context Free Grammars
A grammar is a 4-tuple G = (V ,Σ,P, S), where V is a (finite) set ofnon-terminal symbols, Σ the (finite) alphabet, P a finite set of productionsthat consist of a left and right side, and S ∈ V the start symbol.
A grammar is context free when all left side of the rules consist of exactlyone non-terminal symbol, and all right sides have at least one symbol.
Special rule for ε: When S is the start symbol, the rule S → ε can occur inthe grammar, when S does not occur on the right side of a rule.
Sander Bruggink Automaten und Formale Sprachen 232
Context Free Languages Context Free Grammars and Syntax Trees
Context Free Grammars
Let Σ = {a, b}.
Example 1: Give a context free grammar G1, such thatL(G1) = {anbn | n ≥ 0}.
Beispiel 2: Give a context free grammar G2, such thatL(G2) = {akbnambn | n,m, k ≥ 1}.
Sander Bruggink Automaten und Formale Sprachen 233
Context Free Languages Context Free Grammars and Syntax Trees
Special rule for ε again
Question: Is the special rule for ε really required? Can we always allow ε asright side?
Answer: yes, we can always allow ε as right side.
ε-free grammars (Theorem)
Let a context free grammar G = (V ,Σ,P,S) be given, containingproductions of the form A→ w , where A ∈ V , w ∈ (V ∪ Σ)∗.Then there is a grammar G ′ = (V ,Σ,P ′,S) containing productions of theform A→ w , where w ∈ (V ∪ Σ)+, such that L(G ) = L(G ′).
Sander Bruggink Automaten und Formale Sprachen 234
Context Free Languages Context Free Grammars and Syntax Trees
Special rule for ε again
Method to remove ε-productions:
1 Determine the set of variables V1 ⊆ V such thatV1 = {A ∈ V | A⇒∗ ε}, that is, the set of all variables from whichthe empty word can be derived.
2 Add for each production B → xAy with A ∈ V1, x , y ∈ (V ∪ Σ)∗ aproduction B → xy to the set of productions. (This production“simulates” the removing of A.)
3 Remove all productions of the form A→ ε.
4 If ε ∈ L(G ) (that is, S ∈ V1), add a new start variable S ′ and theproductions S ′ → ε and S ′ → S .
Sander Bruggink Automaten und Formale Sprachen 235
Context Free Languages Context Free Grammars and Syntax Trees
Example: Removing ε-Productions
Let G = (V ,Σ,P,S), where V = {S ,X ,Y ,Z}, Σ = {a, b} and P =
S → XZ
X → aYb | εY → bXa | bb
Z → ε | aSa
Sander Bruggink Automaten und Formale Sprachen 236
Context Free Languages Context Free Grammars and Syntax Trees
Special rule for ε again
Because we can transform each grammar which is “almost” context freebut contains the empty word as right side into a context free grammar, wewill allow the empty word as right side of a production.
Sometimes it is convenient to assume that ε is not a right side inconstructions and proofs.
Sander Bruggink Automaten und Formale Sprachen 237
Context Free Languages Context Free Grammars and Syntax Trees
Syntax Trees and Ambiguity
We consider the following example grammar that generates arithmeticalexpressions.
G = ({E ,T ,F}, {(, ), a,+, ∗},P,E )
with the following set of productions P:
E → T | E + T
T → F | T ∗ F
F → a | (E )
Sander Bruggink Automaten und Formale Sprachen 238
Context Free Languages Context Free Grammars and Syntax Trees
Syntax Trees and Ambiguity
For most words of the language there are two different derivations:
E ⇒ T ⇒ T ∗ F ⇒ F ∗ F → a ∗ F ⇒ a ∗ (E )
⇒ a ∗ (E + T )⇒ a ∗ (T + T )⇒ a ∗ (F + T )
⇒ a ∗ (a + T )⇒ a ∗ (a + F )⇒ a ∗ (a + a)
E ⇒ T ⇒ T ∗ F ⇒ T ∗ (E )→ T ∗ (E + T )
⇒ T ∗ (E + F )⇒ T ∗ (E + a)⇒ T ∗ (T + a)
⇒ T ∗ (F + a)⇒ T ∗ (a + a)⇒ F ∗ (a + a)⇒ a ∗ (a + a)
The first derivation is a so-called left derivation (we always derive as far tothe left as possible), while the second one is a right derivation (we alwaysderives as far to the right as possible).
Sander Bruggink Automaten und Formale Sprachen 239
Context Free Languages Context Free Grammars and Syntax Trees
Syntax Trees and Ambiguity
Syntaxbaum aufbauen
From the derivations we construct a syntax tree by
Labelling the root of the tree by the start variable of the grammar.
For each application of a rule A→ z we add to the node A |z |children labelled with the symbols from z .
Syntax tree can be constructed for all derivations of context freegrammars.
Sander Bruggink Automaten und Formale Sprachen 240
Context Free Languages Context Free Grammars and Syntax Trees
Syntax Trees and Ambiguity
We obtain the same syntax tree inboth cases.
A grammar is called unambiguouswhen there is a single syntax tree foreach word in the language.
A grammar is called ambiguous whenthere is a word which has two ormore syntax trees. Eine Grammatikist mehrdeutig, wenn es fur ein Wortin der erzeugten Sprache, zwei odermehr Syntaxbaume gibt.
F
a
F
a
T
F
a
T
T
E
T F
E( )
∗
E +
Sander Bruggink Automaten und Formale Sprachen 241
Context Free Languages Context Free Grammars and Syntax Trees
Syntax Trees and Ambiguity
Example of an ambiguous grammar
Sei G = (V ,Σ,P,S), wobei V = {S}, Σ = {(, )} und P =
S → (S) | SS | ε
Sander Bruggink Automaten und Formale Sprachen 242
Context Free Languages The Chomsky Normal Form
The Chomsky Normal Form
We now consider a convenient normal form.
Chomsky normal form (Definition)
A context free grammar G = (V ,Σ,P,S) with ε 6∈ L(G ) is in Chomskynormal form when all productions have one the following two forms:
A→ BC A→ a
Here, A,B,C ∈ V are variables and a ∈ Σ is an alphabet symbol.
Sander Bruggink Automaten und Formale Sprachen 243
Context Free Languages The Chomsky Normal Form
The Chomsky Normal Form
Transformation into Chomsky normal form (Theorem)
For each context free grammar G with ε 6∈ L(G ) there is a grammar G ′ inChomsky normal form with L(G ) = L(G ′).
Sander Bruggink Automaten und Formale Sprachen 244
Context Free Languages The Chomsky Normal Form
The Chomsky Normal Form
Procedure to transform a grammar in Chomsky normal form:
1 remove ε-productions (see slide 235)
2 remove chain productions (V1 → V2)
3 remove alphabetsymbolen from the right sides
4 split long right sides
Sander Bruggink Automaten und Formale Sprachen 245
Context Free Languages The Chomsky Normal Form
The Chomsky Normal Form
Step 2: Removing chain productions
There are two cases:
Case 1: A chain production is on a cycle A1 → A2 → · · · → Ak → A1 ofproductions. In this case we replace all variables A1, . . . ,Ak by a singlevariable A and remove the chain productions.
Sander Bruggink Automaten und Formale Sprachen 246
Context Free Languages The Chomsky Normal Form
The Chomsky Normal Form
Step 2: Removing chain productions
Case 2: There is no cycle. In this case, the variables can be seriallynumbered: A1, . . . ,Ak , such that Ai → Aj holds only when i < j . Now weiterator from high to low numbers (i = k−1, . . . , 1) and replace Ai → Aj
byAi → x1 | · · · | xn,
when the rules with Aj on the left have the following form:
Aj → x1 | · · · | xn
(Introducing “shortcuts”)
Sander Bruggink Automaten und Formale Sprachen 247
Context Free Languages The Chomsky Normal Form
The Chomsky Normal Form
Step 3: Removing alphabet symbols
When a production A→ w has more than one symbol on the right side(d.h., |w | > 1), each terminal symbol a in w is replaced by a new variableUa. Additionally, productions Ua → a are added. Then, there are onlyterminal symbols on the right sides.
Only apply this step, when |w | > 1.
Sander Bruggink Automaten und Formale Sprachen 248
Context Free Languages The Chomsky Normal Form
The Chomsky Normal Form
Step 4: Split long right sides
In the last step productions of the form A→ B1 . . .Bk are eliminated: addnew variables C1, . . . ,Ck−2, remove the original productions and replace itby:
A→ B1C1
C1 → B2C2
...
Ck−2 → Bk−1Bk
Sander Bruggink Automaten und Formale Sprachen 249
Context Free Languages The Chomsky Normal Form
The Chomsky Normal Form
Example:
Let G = ({S ,X ,Y }, {a, b, c},P,S) be a grammar, where P contains thefollowing productions:
S → aXb
X → S | Y | aaSc | εY → X | bbSc
Transform G into Chomsky normal form.
Sander Bruggink Automaten und Formale Sprachen 250
Context Free Languages The CYK-Algorithm
The CYK-Algorithm
The CYK-Algorithmus: an efficient algorithm which decides whether aword w is generated by a context free grammar.
The algorithm was developed by John Cocke, Daniel Younger and TadaoKasami.
It requires a context free grammar G in Chomsky normal form an a wordw as input, and outputs, whether w is in the language of G .
Sander Bruggink Automaten und Formale Sprachen 251
Context Free Languages The CYK-Algorithm
The CYK-Algorithm
Idea: Let the word x ∈ Σ∗ be given. We want to know from whichvariables the word can be derived.
Possibility 1: x = a ∈ Σ, that is, x consists of a single alphabetsymbol. Then w can only be derived from variables A for which aproduction A→ a exists.
Possibility 2: x = a1 . . . an with n ≥ 2. In this case a productionA→ BC must be applied first, and then one part a1 . . . ak of wordmust be derived from B and one part ak+1 . . . an from C (1 ≤ k < n).
Sander Bruggink Automaten und Formale Sprachen 252
Context Free Languages The CYK-Algorithm
The CYK-Algorithm
Possibility 2 is schematically represented as follows:
A
B C
a1 . . . ak ak+1 . . . an
Sander Bruggink Automaten und Formale Sprachen 253
Context Free Languages The CYK-Algorithm
The CYK-Algorithm
However, it is not clear where the word x must be split, that is, how largethe index k is.
Therefore: We need to try all possible k’s. This means:
Let a word x = a1 . . . an be given. For all q with 1 < k < n do thefollowing:
Determine the set of variables V1 from which we can derive a1 . . . ak .
Determine the set of variables V2 from which we can deriveak+1 . . . an.
Check, wheter there are variables A,B,C with (A→ BC ) ∈ P,B ∈ V1 and C ∈ V2. In this case x can be derived from A.
Sander Bruggink Automaten und Formale Sprachen 254
Context Free Languages The CYK-Algorithm
The CYK-Algorithm
To avoid duplicate work, we apply methods of dynamic programming,which means:
we first determine all variables from which subwords of length 1 canbe derived;
we then determine all variables from which subwords of length 2 canbe derived;
. . .
finally we determine all variables from which x can be derived. Whenthe start symbol S is among these variables, then x is in the languageof the grammar.
Sander Bruggink Automaten und Formale Sprachen 255
Context Free Languages The CYK-Algorithm
The CYK-Algorithm
Notation: With xi ,j we denote the subword of x that starts on location iand has length j .
x = a1 . . . an xi ,j = ai . . . ai+j−1.
With this notation, the tree of before looks like this:
A
B C
x1,k xk+1,n−k
Sander Bruggink Automaten und Formale Sprachen 256
Context Free Languages The CYK-Algorithm
The CYK-Algorithm
With Ti ,j we denote the the set of variables from which xi ,j can be derived.
Ti ,j is determined as follows:
If j = 1, thenTi ,j = {A | (A→ xi ,j) ∈ P}
If j > 1, then
Ti ,j= {A | (A→ BC ) ∈ P
und es gibt k < j mit B ∈ Ti ,k und C ∈ Ti+k,j−k}
Sander Bruggink Automaten und Formale Sprachen 257
Context Free Languages The CYK-Algorithm
The CYK-Algorithm
Practical execution of the algorithm:We insert the sets of variables Ti ,j in the following table:
a1 a2 an−1 an
j = 1
j = n − 1
j = n
. . .
. . .
T1,n
T1,n−1T2,n−1
. . .. . .. . .
. . . . . . . . . . . .
Tn−1,2. . .. . .T2,2
T1,1 T2,1 Tn−1,1 Tn,1. . . . . .
T1,2j = 2
Sander Bruggink Automaten und Formale Sprachen 258
Context Free Languages The CYK-Algorithm
The CYK-Algorithm
The variable derive the following subwords:
j = 6
j = 5
j = 4
j = 3
j = 2
j = 1 T1,1
a1
T1,2 T2,2
T2,1
a2 a3
T3,1
T3,2 T4,2
T4,1
a4 a5
T5,2
T6,1
a6
T1,6
T5,1
T1,3 T2,3 T3,3 T4,3
T1,4 T2,4 T3,4
T1,5 T2,5
Sander Bruggink Automaten und Formale Sprachen 259
Context Free Languages The CYK-Algorithm
The CYK-Algorithm
a1 a2
j = 1
j = 2
j = 5
j = 6
j = 3
j = 4
a6a5a3 a4
T1,6
T1,5
T6,1
x = a1a2a3a4a5 | a6
(A→ BC ) ∈ P,B ∈ T1,5, C ∈ T6,1 ⇒ A ∈ T1,6
Sander Bruggink Automaten und Formale Sprachen 260
Context Free Languages The CYK-Algorithm
The CYK-Algorithm
a1 a2
j = 1
j = 2
j = 5
j = 6
j = 3
j = 4
a6a5a3 a4
T1,6
T1,4
T5,2 x = a1a2a3a4 | a5a6
(A→ BC ) ∈ P,B ∈ T1,4, C ∈ T5,2 ⇒ A ∈ T1,6
Sander Bruggink Automaten und Formale Sprachen 260
Context Free Languages The CYK-Algorithm
The CYK-Algorithm
a1 a2
j = 1
j = 2
j = 5
j = 6
j = 3
j = 4
a6a5a3 a4
T1,6
T1,3 T4,3
x = a1a2a3 | a4a5a6
(A→ BC ) ∈ P,B ∈ T1,3, C ∈ T4,3 ⇒ A ∈ T1,6
Sander Bruggink Automaten und Formale Sprachen 260
Context Free Languages The CYK-Algorithm
The CYK-Algorithm
a1 a2
j = 1
j = 2
j = 5
j = 6
j = 3
j = 4
a6a5a3 a4
T1,6
T1,2
T3,4
x = a1a2 | a3a4a5a6
(A→ BC ) ∈ P,B ∈ T1,2, C ∈ T3,4 ⇒ A ∈ T1,6
Sander Bruggink Automaten und Formale Sprachen 260
Context Free Languages The CYK-Algorithm
The CYK-Algorithm
a1 a2
j = 1
j = 2
j = 5
j = 6
j = 3
j = 4
a6a5a3 a4
T1,6
T1,1
T2,5
x = a1 | a2a3a4a5a6
(A→ BC ) ∈ P,B ∈ T1,1, C ∈ T2,5 ⇒ A ∈ T1,6
Sander Bruggink Automaten und Formale Sprachen 260
Context Free Languages The CYK-Algorithm
The CYK-Algorithm
input G = (V ,Σ,P, S), w ∈ Σ∗
n := |w |for i ∈ {1, . . . , n} do // upper row, subword length = 1
Ti ,1 := {A | A→ xi ,1 ∈ P}endfor j ∈ {2, . . . , n} do // rest of table, subword length ≥ 2
for i ∈ {1, . . . , n − j + 1} do // start of the subwordTi ,j := ∅for k ∈ {1, . . . , j − 1} do // try out possible splits
Ti ,j := Ti ,j ∪{A | A→ BC ∈ P for some B ∈ Ti ,k ,C ∈ Ti+k,j−k}
endend
endif S ∈ T1,n then return true else return false
Sander Bruggink Automaten und Formale Sprachen 261
Context Free Languages The CYK-Algorithm
The CYK-Algorithm
Example 1: Consider a context free grammar with the followingproductions:
S → AD | FG
D → SE | BC
E → BC
F → AF | a
G → BG | CG | b
H → SC
A→ a
B → b
C → c
Question: Let x = aabcbc . Is x ∈ L?
Sander Bruggink Automaten und Formale Sprachen 262
Context Free Languages The CYK-Algorithm
The CYK-Algorithm
Example 2: Consider a context free grammar with the followingproductions:
S → AB
A→ ab | aAb
B → c | cB
Questions: Let x = aaabbbcc. Is x ∈ L?
Sander Bruggink Automaten und Formale Sprachen 263
Context Free Languages The CYK-Algorithm
The CYK-Algorithm
Complexity of the CYK-Algorithm
Let n = |x | be the length of the input word. Then:
We need fewer as n2 boxes in the table.
In order to fill in a box of the table, 2 · n other fields must beconsidered. Fur das Ausfullen jedes Tabellenfeldes mussen bis zu 2 · nandere Felder betrachtet werden.
The algorithm makes less than c · n3 steps, where c is a constant. This iscalled cubic run time.
Sander Bruggink Automaten und Formale Sprachen 264
Context Free Languages The CYK-Algorithm
The CYK-Algorithm
The CYK-Algorithm is one of the most efficient algorithms that work forarbitrary context free grammars.
In praxis, however, it is to slow to parse e.g. large Java programs.
There are also procedure which are much more efficient. However, theycan only be used on a subclass of the context free languages. In praxis,recursive descent parsers and LR(k)-parsers are often used.
Sander Bruggink Automaten und Formale Sprachen 265
Context Free Languages Pumping Lemma for Context Free Languages
Pumping Lemma
How do we show that a language is not context free?
Rightarrow Pumping Lemma for context free languages.
The statement
Every sufficiently long word uses a state twice.
for regular languages is replaced by
In a path of the syntax tree that represents the derivation of a sufficientlylong word in a context free grammar, one of the variables occurs at leasttwice.
Sander Bruggink Automaten und Formale Sprachen 266
Context Free Languages Pumping Lemma for Context Free Languages
How Many Leaves Does a Syntax Tree Have?
How many leaves does a syntax tree of depth n have?
SDepth = 020 = 1 leaves
? ?Depth = 121 = 2 leaves
? ? ? ?Depth = 222 = 4 leaves
a1 a2 a3 a4Depth = 323 = 8 leaves
In general:Depth = n2n = 2n leaves
For a grammar in Chomsky normal form.
Conversely: a tree with 2n leaves has at least depth n.
Sander Bruggink Automaten und Formale Sprachen 267
Context Free Languages Pumping Lemma for Context Free Languages
Pumping Lemma
This means:
Let G be a grammar in Chomsky normal form and k the number ofvariables in G .
For a word z ∈ L(G ) with |z | > 2k , the corresponding syntaxt treehas more than 2k leaves.
This means, that the depth of the syntax tree is larger than k (notcounting the level of the leaves), and therefore there exists a pathfrom root to leave of length at least k + 1.
On this path there are at least k + 1 variables. At least one variableoccurs on this path at least twice.
Sander Bruggink Automaten und Formale Sprachen 268
Context Free Languages Pumping Lemma for Context Free Languages
Pumping Lemma
Syntax tree for a word z with |z | ≥ n = 2k
Here, n is the “pumping lemma constant”
S
Baum
Wort z
Ebene der Blatter(letzter Ableitungsschritt)
Sander Bruggink Automaten und Formale Sprachen 269
Context Free Languages Pumping Lemma for Context Free Languages
Pumping Lemma
There is a path with at least k + 1 inner nodes.
S
Wort z
Sander Bruggink Automaten und Formale Sprachen 269
Context Free Languages Pumping Lemma for Context Free Languages
Pumping Lemma
On this path some variable occurs twice. For example, A.
A
A
S
Wort z
Sander Bruggink Automaten und Formale Sprachen 269
Context Free Languages Pumping Lemma for Context Free Languages
Pumping Lemma
The word z is decomposed in five subwords u, v , w , x , y :
w is derived by the lower A: A⇒∗ w
uwx is derived by the upper A: A⇒∗ vwx
u v w x y
S
A
A
Sander Bruggink Automaten und Formale Sprachen 269
Context Free Languages Pumping Lemma for Context Free Languages
Pumping Lemma
We obtain three smaller syntax trees, that can be put together again indifferent ways.
u v w x y
S
A
A
Sander Bruggink Automaten und Formale Sprachen 269
Context Free Languages Pumping Lemma for Context Free Languages
Pumping Lemma
By removing the middle sub tree one obtains a syntax tree for uwy . There-fore, uwy ∈ L holds.
u y
w
S
A
Sander Bruggink Automaten und Formale Sprachen 269
Context Free Languages Pumping Lemma for Context Free Languages
Pumping Lemma
By duplicating the middle sub tree one obtains a syntax tree for uv 2wx2y .Therefore, uv 2wx2y ∈ L holds.
u y
v w x
v x
S
A
A
A
Sander Bruggink Automaten und Formale Sprachen 269
Context Free Languages Pumping Lemma for Context Free Languages
Pumping Lemma
Additionally, one can require the following properties for v ,w , x :
|vwx | ≤ n = 2k
|vx | ≥ 1
u v w x y
S
A
A
Sander Bruggink Automaten und Formale Sprachen 270
Context Free Languages Pumping Lemma for Context Free Languages
Pumping Lemma
Pumping Lemma, uvwxy -Theorem (Theorem)
Let L be context free language. There is number n such that all wordsz ∈ L with |z | > n can be decomposed in z = uvwxy , such that thefollowing properties hold:
1 |vx | ≥ 1,
2 |vwx | ≤ n and
3 for all i = 0, 1, 2, . . . it holds that uv iwx iy ∈ L.
Here, n = 2|V | comes from the number of variables of the context freegrammar.
Sander Bruggink Automaten und Formale Sprachen 271
Context Free Languages Pumping Lemma for Context Free Languages
Pumping Lemma
Pumping Lemma (alternative phrasing)
Let L be a language. Assume that, for each number n a word z ∈ L with|x | > n can be chosen, such that for all decompositions z = uvwxy suchthat
1 |vx | ≥ 1 and
2 |vwx | ≤ n
there is a number i such that uv iwx iy /∈ L. Then L is not context free.
Sander Bruggink Automaten und Formale Sprachen 272
Context Free Languages Pumping Lemma for Context Free Languages
Pumping Lemma
Let L be a language. We want to use the pumping lemma to show that alanguage is not context free.
Cooking recipe for the pumping lemma
1 Take an arbitrary number n.
2 Choose a word z ∈ L with |x | > n.
3 Consider all possible decompositions z = uvwxy with the restrictionsthat |vx | ≥ 1 and |vwx | ≤ n.
4 When there is a number i such that uv iwx iy /∈ L for all suchdecompositions, then the language L is not context free.
Sander Bruggink Automaten und Formale Sprachen 273
Context Free Languages Pumping Lemma for Context Free Languages
Pumping-Lemma
With the help of the pumping lemma, we can now show that the followinglanguages are not context free:
Example 1:
L1 = {ambm2 | m > 0}Example 2:
L2 = {ambmcm | m > 0}
Sander Bruggink Automaten und Formale Sprachen 274
Context Free Languages Push Down Automata
Push Down Automata
What is a fitting automata model for context free languages?
Analogously to regular languages we want to have an automaton model forcontext free languages.
Answer: Push down automata, (german: Kellerautomaten
Automata that are equipped with an additional stack.
Sander Bruggink Automaten und Formale Sprachen 275
Context Free Languages Push Down Automata
Push-Down Automata
We consider the language
L = {w$w R | w ∈ {a, b, c , d}∗}
with Σ = {a, b, c , d , $}.Here w R the reverse of the word w (for example: (abc)R = cba).
Sander Bruggink Automaten und Formale Sprachen 276
Context Free Languages Push Down Automata
Push-Down Automata
To obtain an automaton model for context free languages,
we introduce a stack or push down memory, on which a sequence ofsymbols of arbitrary length can be stored
during the reading of a new symbol the top symbol on the stack canbe read an changed in the following ways:
either the stack is not changed oderthe top symbol on the stack is removed (pop) and replaced by asequence of symbols (push).
Somewhere else the stack cannot be read or changed.
Sander Bruggink Automaten und Formale Sprachen 277
Context Free Languages Push Down Automata
Push-Down Automata
Schematic representation of a push-down automaton:
e i n g a b e
A
B
C
#
Keller
Kellerbodenzeichen
Kellerautomat
Sander Bruggink Automaten und Formale Sprachen 278
Context Free Languages Push Down Automata
Push-Down Automata
We consider the language
L = {w$w R | w ∈ {a, b, c , d}∗}
with Σ = {a, b, c , d , $}.A push-down automaton recognizes the language in the following way:
A word w is read from left to right.
The automaton has two states:
State 1: Store the first part of the word.
State 2: Check the second part of the word.
Sander Bruggink Automaten und Formale Sprachen 279
Context Free Languages Push Down Automata
Push-Down Automata
State 1: As long a $ was not reached: push each symbol read ascapital on the stack(a A, b B, . . . ).
When $ is read: don’t change the stack and change to state 2.
State 2: check for each symbol read, whether the correspondingcapital is the top symbol on the stack. This symbol is removed.
When the read symbol and the top stack symbol don’t correspond,there are no possible transitions. The push-down automaton blocksand the word is not accepted.
When the two always correspond: the stack bottom symbol # isremoved and the automaton accepts the word with an empty stack.
Sander Bruggink Automaten und Formale Sprachen 280
Context Free Languages Push Down Automata
Push-Down Automata
Simulation
KellerautomatZustand 1
c a d $ d a c aa
#
Sander Bruggink Automaten und Formale Sprachen 281
Context Free Languages Push Down Automata
Push-Down Automata
Simulation
KellerautomatZustand 1
a d $ d a c a
#
a c
A
Sander Bruggink Automaten und Formale Sprachen 281
Context Free Languages Push Down Automata
Push-Down Automata
Simulation
KellerautomatZustand 1
d $ d a c a
#
a
A
c a
C
Sander Bruggink Automaten und Formale Sprachen 281
Context Free Languages Push Down Automata
Push-Down Automata
Simulation
KellerautomatZustand 1
$ d a c a
#
a
A
c da
C
A
Sander Bruggink Automaten und Formale Sprachen 281
Context Free Languages Push Down Automata
Push-Down Automata
Simulation
d a c a
#
a
A
c a
C
A
d $
KellerautomatZustand 1
D
Sander Bruggink Automaten und Formale Sprachen 281
Context Free Languages Push Down Automata
Push-Down Automata
Simulation
a c a
#
a
A
c a
C
A
d d$
KellerautomatZustand 2
D
Sander Bruggink Automaten und Formale Sprachen 281
Context Free Languages Push Down Automata
Push-Down Automata
Simulation
c a
#
a
A
c a
C
d $
KellerautomatZustand 2
ad
A
Sander Bruggink Automaten und Formale Sprachen 281
Context Free Languages Push Down Automata
Push-Down Automata
Simulation
a
#
a
A
c a d $
KellerautomatZustand 2
d a c
C
Sander Bruggink Automaten und Formale Sprachen 281
Context Free Languages Push Down Automata
Push-Down Automata
Simulation
#
a c a d $
KellerautomatZustand 2
d a
A
c a
Sander Bruggink Automaten und Formale Sprachen 281
Context Free Languages Push Down Automata
Push-Down Automata
Simulation
a c a d $
KellerautomatZustand 2
d a c
#
a
Sander Bruggink Automaten und Formale Sprachen 281
Context Free Languages Push Down Automata
Push-Down Automata
Simulation
a c a d $
Kellerautomat
d a c a
Zustand 2
Sander Bruggink Automaten und Formale Sprachen 281
Context Free Languages Push Down Automata
Push-Down Automata
Push-down automaton (Definition)
A (non-deterministic) push-down automaton M is a 6-tupleM = (Z ,Σ, Γ, δ, z0,#), where
Z is the set of states,
Σ is the input alphabet (with Z ∩ Σ = ∅),
Γ is the stack alphabet,
z0 ∈ Z is the initial state,
δ : Z × (Σ ∪ {ε})× Γ→ Pe(Z × Γ∗) is the transition function and
# ∈ Γ is the stack bottom symbol.
Sander Bruggink Automaten und Formale Sprachen 282
Context Free Languages Push Down Automata
Push-Down Automata
Remarks about push-down automata:
Z , Σ must be finite sets again.
Pe(Z × Γ∗) denotes the set of all finite subsets of Z × Γ∗.
Abbreviation: PDA (pushdown automaton) or KA (Kellerautomat)
Sander Bruggink Automaten und Formale Sprachen 283
Context Free Languages Push Down Automata
Push Down Automata
Consider the transition function
δ : Z × (Σ ∪ {ε})× Γ→ Pe(Z × Γ∗)
When (z ′,B1 . . .Bk) ∈ δ(z , a,A) this means that
when the input symbol a is read in state z and the symbol A is the topsymbol on the stack, thenA is removed from the stack and replaced by B1 . . .Bk (B1 is the topsymbol) and the automaton transfers to state z ′.
It is also possible that a = ε. In this case no input symbol is read.
Sander Bruggink Automaten und Formale Sprachen 284
Context Free Languages Push Down Automata
Push-Down Automata
We consider several cases of values of the transition function δ:
(z ′, ε) ∈ δ(z , a,A)
Symbol a is read
State changes from z to z ′
Symbol A is removed from the stack:
A
Sander Bruggink Automaten und Formale Sprachen 285
Context Free Languages Push Down Automata
Push-Down Automata
(z ′,B) ∈ δ(z , a,A)
Symbol a is read
State changes from z to z ′
Symbol A on the stack is repla-ced by B:
A B
Sander Bruggink Automaten und Formale Sprachen 286
Context Free Languages Push Down Automata
Push-Down Automata
(z ′,A) ∈ δ(z , a,A)
Symbol a is read
State changes from z to z ′
Symbol A stays on the stack:
AA
Sander Bruggink Automaten und Formale Sprachen 287
Context Free Languages Push Down Automata
Push-Down Automata
(z ′,BA) ∈ δ(z , a,A)
Symbol a is read
State changes from z to z ′
Symbol B is put on the stack:
B
AA
Sander Bruggink Automaten und Formale Sprachen 288
Context Free Languages Push Down Automata
Push-Down Automata
(z ′,B1 . . .Bk) ∈ δ(z , a,A)
Symbol a is read
State changes from z to z ′
Symbol A is replace by severalsymbols:
A
. . .
B1
Bk
Sander Bruggink Automaten und Formale Sprachen 289
Context Free Languages Push Down Automata
Push-Down Automata
Configuration (definition)
A configuration of a push-down automaton is a triple
k ∈ Z × Σ∗ × Γ∗.
Meaning of the components of k = (z ,w , γ) ∈ Z × Σ∗ × Γ∗:
z ∈ Z is the current state of the push-down automaton.
w ∈ Σ∗ is the input which still needs to be read.
γ ∈ Γ∗ is the current stack. The top stack symbols is put on the left.
Sander Bruggink Automaten und Formale Sprachen 290
Context Free Languages Push Down Automata
Push-Down Automata
Transitions between configurations arise from the transition function δ:
Configuration transitions (definition)
It holds that(z , aw ,Aγ) ` (z ′,w ,B1 . . .Bkγ),
when (z ′,B1 . . .Bk) ∈ δ(z , a,A) and it holds that
(z ,w ,Aγ) ` (z ′,w ,B1 . . .Bkγ),
when (z ′,B1 . . .Bk) ∈ δ(z , ε,A).
In the first case a symbol is read from the input, but not in the secondcase.
Sander Bruggink Automaten und Formale Sprachen 291
Context Free Languages Push Down Automata
Push-Down Automata
Accepted language (definition)
Let M = (Z ,Σ, Γ, δ, z0,#) be a push-down automaton. Then the languageaccepted by M is:
N(M) = {x ∈ Σ∗ | (z0, x ,#) `∗ (z , ε, ε) fur ein z ∈ Z}.
The accepted language consists of all words, with which it is possible tocompletely empty the stack. Since push-down automata arenon-deterministic, it is possible that there are calculations for the wordthat do not end in an empty stack.
Sander Bruggink Automaten und Formale Sprachen 292
Context Free Languages Push Down Automata
Push-Down Automata
At the start of each calculation the stack contains exactly the stackbottom symbol #.
The stack is unbounded and can grow arbitrarily large. There areinfinitely many possible stacks; this distinguishes push-down automatafrom finite automata.
The push-down automata we consider always accept with an emptystack. Variants exist which have final states, like finite automata.
Sander Bruggink Automaten und Formale Sprachen 293
Context Free Languages Push Down Automata
Push-Down Automata: Examples
Example 1: Let Σ = {a, b, $}.Give a push-down automaton M for the languageN(M) = {w$w R | w ∈ {a, b}∗}.
Example 2: Let Σ = {a, b}.Give a push-down automaton M ′ for the languageN(M ′) = {ww R | w ∈ {a, b}∗}.
Biespiel 3: Let Σ = {a, b}.Give a push-down automaton K for the languageN(K ) = {anbm | 1 ≤ n ≤ m}.
Sander Bruggink Automaten und Formale Sprachen 294
Context Free Languages Push Down Automata
Push-Down Automata
Now we have to show that push-down automata really accept exactly thecontext-free languages.
Context free grammar → push-down automaton (theorem)
For each context free grammar G there is a push-down automaton M withL(G ) = N(M).
Sander Bruggink Automaten und Formale Sprachen 295
Context Free Languages Push Down Automata
Push-Down Automata
Proof idea:
1 Use the stack to simulate the grammar. Derive a word of thelanguage on the stack (guess non-deterministically) and check,whether the word corresponds to the input.
2 Problem: the stack cannot be used arbitrarily; we can only replace thetop stack symbol. Solution: remove already derived parts of the word
from the stack, by comparing them with the input and remove themwhen they correspond.
3 Thus one can make sure that a variable always one the top of thestack.
Sander Bruggink Automaten und Formale Sprachen 296
Context Free Languages Push Down Automata
Push-Down Automata
More formally:let G = (V ,Σ,P,S) be a context free grammar. We define the push-downautomaton
M = ({z},Σ,V ∪ Σ, δ, z ,S)
with a single state z and the stack alphabet V ∪Σ. The start variable S isthe stack bottom symbol.
Transition function δ:
For each production (A→ α) ∈ P with α ∈ (V ∪ Σ)∗, add (z , α) tothe set δ(z , ε,A).(Derivation step on the stack without reading from the input.)
Additionally, add (z , ε) to δ(z , a, a), for each a ∈ Σ.(Comparing stack and input.)
Sander Bruggink Automaten und Formale Sprachen 297
Context Free Languages Push Down Automata
Push-Down Automata
We consider the following context free grammar with the two-elementalphabet Σ = {[, ]}, which generates correctly paranthesized structures:
S → [S ]S | ε
Exercise: transform the grammar in a push-down automaton and acceptthe word [ [ ] ] [ ].
Sander Bruggink Automaten und Formale Sprachen 298
Context Free Languages Push Down Automata
Push-Down Automata
Simulation of the PDA on the word [ [ ] ] [ ]
[ ] ] [ ]
Kellerautomat
Konfiguration:
[
Zustand z
(z , [ [ ] ] [ ], S)
S
Grammar:
S → [S ]S
S → ε
Sander Bruggink Automaten und Formale Sprachen 299
Context Free Languages Push Down Automata
Push-Down Automata
Simulation of the PDA on the word [ [ ] ] [ ]
[ ] ] [ ]
Kellerautomat
Konfiguration:
Zustand z
(z , [ [ ] ] [ ], [S ]S)
S
]
S
[
[
Grammar:
S → [S ]S
S → ε
Sander Bruggink Automaten und Formale Sprachen 299
Context Free Languages Push Down Automata
Push-Down Automata
Simulation of the PDA on the word [ [ ] ] [ ]
[ ] ] [ ]
Kellerautomat
Konfiguration:
Zustand z
S
]
[
(z , [ ] ] [ ], S ]S)
S
Grammar:
S → [S ]S
S → ε
Sander Bruggink Automaten und Formale Sprachen 299
Context Free Languages Push Down Automata
Push-Down Automata
Simulation of the PDA on the word [ [ ] ] [ ]
] ] [ ]
Kellerautomat
Konfiguration:
Zustand z
S
]
[
S
]
S
(z , [ ] ] [ ], [S ]S ]S)
[
[
Grammar:
S → [S ]S
S → ε
Sander Bruggink Automaten und Formale Sprachen 299
Context Free Languages Push Down Automata
Push-Down Automata
Simulation of the PDA on the word [ [ ] ] [ ]
[ ] [ ]
Kellerautomat
Konfiguration:
Zustand z
[
S
]
S
(z , ] ] [ ], S ]S ]S)
]
]
SGrammar:
S → [S ]S
S → ε
Sander Bruggink Automaten und Formale Sprachen 299
Context Free Languages Push Down Automata
Push-Down Automata
Simulation of the PDA on the word [ [ ] ] [ ]
[ ] [ ]
Kellerautomat
Konfiguration:
Zustand z
[
S
]
S
]
]
(z , ] ] [ ], ]S ]S)
Grammar:
S → [S ]S
S → ε
Sander Bruggink Automaten und Formale Sprachen 299
Context Free Languages Push Down Automata
Push-Down Automata
Simulation of the PDA on the word [ [ ] ] [ ]
[ [ ]
Kellerautomat
Konfiguration:
Zustand z
[
S
]
] ]
S
(z , ] [ ], S ]S)
Grammar:
S → [S ]S
S → ε
Sander Bruggink Automaten und Formale Sprachen 299
Context Free Languages Push Down Automata
Push-Down Automata
Simulation of the PDA on the word [ [ ] ] [ ]
[ [ ]
Kellerautomat
Konfiguration:
Zustand z
[
S
]
(z , ] [ ], ]S)
]
]
Grammar:
S → [S ]S
S → ε
Sander Bruggink Automaten und Formale Sprachen 299
Context Free Languages Push Down Automata
Push-Down Automata
Simulation of the PDA on the word [ [ ] ] [ ]
[ [ ]
Kellerautomat
Konfiguration:
Zustand z
[ ] ]
S
(z , [ ], S)
Grammar:
S → [S ]S
S → ε
Sander Bruggink Automaten und Formale Sprachen 299
Context Free Languages Push Down Automata
Push-Down Automata
Simulation of the PDA on the word [ [ ] ] [ ]
[ ]
Kellerautomat
Konfiguration:
Zustand z
[ ] ]
]
S
S
[
[
(z , [ ], [S ]S)
Grammar:
S → [S ]S
S → ε
Sander Bruggink Automaten und Formale Sprachen 299
Context Free Languages Push Down Automata
Push-Down Automata
Simulation of the PDA on the word [ [ ] ] [ ]
[ ]
Kellerautomat
Konfiguration:
Zustand z
[ ] ]
]
S
[
(z , ], S ]S)
S
Grammar:
S → [S ]S
S → ε
Sander Bruggink Automaten und Formale Sprachen 299
Context Free Languages Push Down Automata
Push-Down Automata
Simulation of the PDA on the word [ [ ] ] [ ]
[
Kellerautomat
Konfiguration:
Zustand z
[ ] ]
S
[
(z , ], ]S)
]
]
Grammar:
S → [S ]S
S → ε
Sander Bruggink Automaten und Formale Sprachen 299
Context Free Languages Push Down Automata
Push-Down Automata
Simulation of the PDA on the word [ [ ] ] [ ]
[
Kellerautomat
Konfiguration:
Zustand z
[ ] ] [
(z , ε, S)
S
]
Grammar:
S → [S ]S
S → ε
Sander Bruggink Automaten und Formale Sprachen 299
Context Free Languages Push Down Automata
Push-Down Automata
Simulation of the PDA on the word [ [ ] ] [ ]
[
Kellerautomat
Konfiguration:
Zustand z
[ ] ] [ ]
(z , ε, ε)
Grammar:
S → [S ]S
S → ε
Sander Bruggink Automaten und Formale Sprachen 299
Context Free Languages Push Down Automata
Push-Down Automaton → Context Free Grammar
Now, we want to show that for each push-down automaton there exists anequivalent context-free grammar.
Push-down automaton → Context free grammar (theorem)
For each push-down automaton M there exists a context free grammar Gwith N(M) = L(G ).
Sander Bruggink Automaten und Formale Sprachen 300
Context Free Languages Push Down Automata
Push-Down Automaton → Context Free Grammar
Proof idea:
We want to describe, which words are accepted as a single symbol isremoved from the stack. The language accepted by the automatonconsists of all words, which are accepted as # is removed from the stack.
Removing from the stack means: in between other symbols may be put onthe stack, but in the end the stack must be one symbol smaller.
Sander Bruggink Automaten und Formale Sprachen 301
Context Free Languages Push Down Automata
Push-Down Automaton → Context Free Grammar
Height ofthe stack
Numer of inputsymbols read
A B
In between A can be repla-ced; however, the stack isnot lower.
A is the topstack symbol
First time the stackis lower.
a1 a2 . . . akWord, which is
read by removingA from the stack
Sander Bruggink Automaten und Formale Sprachen 302
Context Free Languages Push Down Automata
Push-Down Automaton → Context Free Grammar
Which production are needed?
There are 2 possibilities:
A is directly removed after reading a:
⇒ production of the form:”A“ → a
After reading a, A is replaced by B1, . . . ,Bn. Now we have to removeB1, . . . ,Bn and concatenate the generated words.
⇒ production of the form”A“ → a
”B1“ · · ·
”Bn“
But how do we also consider the states of the push-down automaton?
Sander Bruggink Automaten und Formale Sprachen 303
Context Free Languages Push Down Automata
Push-Down Automaton → Context Free Grammar
Question: How do we consider the states of the push-down automaton?
Answer: We will generate a context free grammar with variables of theform (z ,A, z ′), meaning
From (z ,A, z ′) we can derive exactly those words, which thepush-down automaton can read, when it starts in state z ,removes A from the stack, and ends in state z ′.
(z ,A, z ′)⇒∗ x ⇐⇒ (z , x ,A) `∗ (z ′, ε, ε)
Sander Bruggink Automaten und Formale Sprachen 304
Context Free Languages Push Down Automata
Push-Down Automaton → Context Free Grammar
Height ofthe stack
Numer of inputsymbols read
A BA is the top
stack sym-bol (PDA
in state z)
First time the stackis lower (PDA instate z ′)
a1 a2 . . . akWord, which is
generated from thevariable (z,A, z ′)
Sander Bruggink Automaten und Formale Sprachen 305
Context Free Languages Push Down Automata
Push-Down Automaton → Context Free Grammar
Formal definition:
Let a push-down automaton M = (Z ,Σ, Γ, δ, z0,#) be given. We define agrammar G = (V ,Σ,P, S) as follows:
Variables: V = {S} ∪ Z × Γ× Z(New start variables and variables of the form (z1,A, z2))
Sander Bruggink Automaten und Formale Sprachen 306
Context Free Languages Push Down Automata
Push-Down Automata
The grammar consists of productions of the following form:
S → (z0,#, z) for all z ∈ Z
(Begin)
(z ,A, z ′) → a when (z ′, ε) ∈ δ(z , a,A)
(Symbol A can be immediately
removed when reading a
(z ,A, z ′) → a(z1,B1, z2)(z2,B2, z3) . . . (zk ,Bk , z′)
when (z1,B1 . . .Bk) ∈ δ(z , a,A), z2, . . . , zk ∈ Z
(Symbol A is replaced by B1 . . .Bk , these
must be removed passing through states z1, . . . , zk
Sander Bruggink Automaten und Formale Sprachen 307
Context Free Languages Push Down Automata
Push-Down Automata
Example: Consider the push-down automaton
M = ({z1, z2}, {a, b}, {A,#}, δ, z1,#)
with the following transition function δ:
(z1, a,#)→ (z1,A#)
(z1, a,A)→ (z1,AA)
(z1, b,A)→ (z2, ε)
(z2, b,A)→ (z2, ε)
(z2, ε,#)→ (z2, ε)
It holds that: N(M) = {anbn | n ≥ 1}.Exercise: Transform M in a context free grammar.
Sander Bruggink Automaten und Formale Sprachen 308
Context Free Languages Push Down Automata
Push-Down Automata
Remark to the translations “context free grammar ↔ push-downautomaton”:
For each push-down automaton there exists an equivalent push-downautomaton with a single state.
To construct it, transform the push-down automaton into a contextfree grammar and then back again into a push-down automaton. Thisworks, because the translation from context free grammars topush-down automata always produces push-down automata with asingle state.
Sander Bruggink Automaten und Formale Sprachen 309
Context Free Languages Push Down Automata
Interlude: Push-Down Automata with Final States
In the literature push-down automata are often defined in such a way, thatthey have final states and do not accept with an empty stack.
Push-down automaton with final states (definition)
A push-down automaton with final states M is a 7-tupleM = (Z ,Σ, Γ, δ, z0,#,E ), where
(Z ,Σ, Γ, δ, z0,#) is a push-down automaton,
E ⊆ Z is the set of final states
Sander Bruggink Automaten und Formale Sprachen 310
Context Free Languages Push Down Automata
Interlude: Push-Down Automata with Final States
Accepted language of a push-down automaton with final states (definition)
Let M = (Z ,Σ, Γ, δ, z0,#,E ) be a push-down automaton with final states.The language accepted by M is:
N(M) = {x ∈ Σ∗ | (z0, x ,#) `∗ (z , ε, γ) fur ein z ∈ E , γ ∈ Γ∗}.
The following differences exist:
The states which is reached in the end is not requires to be a finalstate.
In the end the stack is not necessarily empty.
Sander Bruggink Automaten und Formale Sprachen 311
Context Free Languages Push Down Automata
Interlude: Push-Down Automata with Final States
Theorem
A language is accepted by a push-down automaton with final states if andonly if it is accepted by a push-down automaton.
(Proof omitted.)
Sander Bruggink Automaten und Formale Sprachen 312
Context Free Languages Push Down Automata
Interlude: Push-Down Automata with Final States
L = {ww R | w ∈ {a, b}∗}.
M = ({z1, z2, z3}, {a, b}, {A,B,#}, δ, z1,#, {z3}), where δ is given by:
(z1, a,#) → (z1,A#) (z1, a,A) → (z1,AA) (z1, a,B) → (z1,AB)(z1, b,#) → (z1,B#) (z1, b,A) → (z1,BA) (z1, b,B) → (z1,BB)(z1, ε,#) → (z2,#) (z1, a,A) → (z2, ε) (z1, b,B) → (z2, ε)(z2, a,A) → (z2, ε) (z2, b,B) → (z2, ε) (z2, ε,#) → (z3,#)
Sander Bruggink Automaten und Formale Sprachen 313
Context Free Languages Closure Properties
Closure Properties
Closure Properties
Are the context free languages closed under the following operations?
Union (L1, L2 context free ⇒ L1 ∪ L2 context free) ?
Product/Concatenation (L1, L2 context free ⇒ L1L2 context free) ?
Star-Operation (L context free ⇒ L∗ context free) ?
Intersection (L1, L2 context free ⇒ L1 ∩ L2 context free) ?
Complement (L context free ⇒ L = Σ∗\L context free) ?
Sander Bruggink Automaten und Formale Sprachen 314
Context Free Languages Closure Properties
Closure Properties
Closure under union
When L1 and L2 are context free languages, then L1 ∪ L2 is also contextfree.
Proof: Let context free grammars
G1 = (V1,Σ,P1,S1) and G2 = (V2,Σ,P2,S2)
(with V1 ∩ V2 = ∅) be given for L1 = L(G1), L2 = L(G2). The grammar
G = (V1 ∪ V2 ∪ {S},Σ,P1 ∪ P2 ∪ {S → S1, S → S2},S)
where S /∈ V1 ∪ V2, is a context free grammar for L1 ∪ L2.
Sander Bruggink Automaten und Formale Sprachen 315
Context Free Languages Closure Properties
Closure Properties
Closure under product/concatenation
When L1 and L2 are context free languages, then L1L2 is also context free.
Proof:G1 = (V1,Σ,P1,S1) and G2 = (V2,Σ,P2,S2)
(with V1 ∩ V2 = ∅) for L1 = L(G1), L2 = L(G2). Then
G = (V1 ∪ V2 ∪ {S},Σ,P1 ∪ P2 ∪ {S → S1S2}, S)
where S /∈ V1 ∪ V2, is a context free grammar for L1L2.
Sander Bruggink Automaten und Formale Sprachen 316
Context Free Languages Closure Properties
Closure Properties
Closure under the Star Operation
When L is a context free language, then L∗ is also context free.
Proof: Let a context free grammar
G1 = (V1,Σ,P1, S1)
for L = L(G1) be given. Then
G = (V1 ∪ {S},Σ,P1 ∪ {S → ε, S → S1S},S)
is a context free grammar for L∗.
Sander Bruggink Automaten und Formale Sprachen 317
Context Free Languages Closure Properties
Closure Properties
No closure under intersection
When L1 and L2 are context free languages, then L1 ∩ L2 is not necessarilycontext free.
Counter example: The languages
L1 = {ajbkck | j ≥ 0, k ≥ 0}L2 = {akbkc j | j ≥ 0, k ≥ 0}
are context free. Their intersection
L1 ∩ L2 = {akbkck | k ≥ 0}
is, however, not context free.
Sander Bruggink Automaten und Formale Sprachen 318
Context Free Languages Closure Properties
Closure Properties
Closure under intersection with regular languages
Let L be context free language and R a regular language. Then it holdsthat L ∩ R is a context free language.
Idea: Similar to the construction of a cross product automaton of two finiteautomata, we can build the cross product of a push-down automaton anda finite automaton, which accepts the intersection of the two languages.
Sander Bruggink Automaten und Formale Sprachen 319
Context Free Languages Closure Properties
Closure Properties
Proof idea:Construction of a push-down automaton M ′ for L ∩ R from a push-downautomaton (with final states) M = (Z1,Σ, Γ, δ1, z
10 ,#,E1) for L and a
deterministic finite automaton A = (Z2,Σ, δ2, z20 ,E2) for R:
M ′ = (Z1 × Z2,Σ, Γ, δ, (z10 , z
20 ),#,E1 × E2)
with
((z ′1, z′2),B1 . . .Bk) ∈ δ((z1, z2), a,A), when
(z ′1,B1 . . .Bk) ∈ δ1(z1, a,A) und δ2(z2, a) = z ′2((z ′1, z2),B1 . . .Bk) ∈ δ((z1, z2), ε,A), when(z ′1,B1 . . .Bk) ∈ δ1(z1, ε,A)
Sander Bruggink Automaten und Formale Sprachen 320
Context Free Languages Closure Properties
Closure Properties
Kein Abschluss unter Komplement
When L is a context free language, then L = Σ∗\L is not necessarilycontext free.
Proof: Assume that the context free languages are closed under
complement. Because L1 ∩ L2 = L1 ∪ L2, then they would be closed underintersection as well, which is not the case. That is, we obtain acontradiction.
Sander Bruggink Automaten und Formale Sprachen 321
Context Free Languages Closure Properties
Closure Properties
Closere properties (summary)
The context free languages are closed under:
Union (L1, L2 context free ⇒ L1 ∪ L2 context free) ?
Product/Concatenation (L1, L2 context free ⇒ L1L2 context free) ?
Star-Operation (L context free ⇒ L∗ context free) ?
Intersection with a regular language (L context free, R regular ⇒L ∩ R context free)
The context free languages are not closed under:
Intersection
Complement
Sander Bruggink Automaten und Formale Sprachen 322
Context Free Languages Decidable Properties
Decidability
Are the following problems decidable?
Word problem: Let a context free language and a word w ∈ Σ∗ begiven. Does w ∈ L hold?
Emptiness problem: Let a context free language L be given. DoesL = ∅ hold?
Finiteness problem: Let a context free language L be given. Is L finite?
Equivalence problem: Let two context free languages L1, L2 be given.Does L1 = L2 hold?
Intersection problem: Let two context free languages L1, L2 be given.Does L1 ∩ L2 = ∅ hold?
Decidable means that there exists an algorithm which solves the problemin every case.
Sander Bruggink Automaten und Formale Sprachen 323
Context Free Languages Decidable Properties
Decidability
Decidability of the word problem
The word problem for context free languages is decidable.
We can solve it with the CYK algorithm.
Sander Bruggink Automaten und Formale Sprachen 324
Context Free Languages Decidable Properties
Decidability
Decidability of the emptiness problem
The emptiness problem for context free languages is decidable.
Proof idea: Determine all productive variables, that is, all variables A, forwhich a x ∈ Σ∗ exists with A⇒∗ x . The language is empty if and only ifthe start variable S is not productive.
Sander Bruggink Automaten und Formale Sprachen 325
Context Free Languages Decidable Properties
Decidability
Determine productive variables:
input Grammar G = (V ,Σ,P, S)T := {A ∈ V | there exists an (A→ w) ∈ P with w ∈ Σ∗}repeat
P ′ := {(A→ w) ∈ P | for all variables B in w : B ∈ T}T := T ∪ {A | there exists a w such that (A→ w) ∈ P ′}
until T is not modified in the last loopreturn T
Determine, whether the language of a grammar is empty:
input Grammatik G = (V ,Σ,P,S)T := produktive Variablen von Greturn S /∈ T
Sander Bruggink Automaten und Formale Sprachen 326
Context Free Languages Decidable Properties
Decidability
Decidability of the finiteness problem
The finiteness problem for context free languages is decidable.
Proof idea: Let a context free grammar be given. Let n = 2|V |, that is, thepumping lemma number of the grammar. All words x ∈ L with |x | ≥ n canbe pumped up, while all words x ∈ L with |x | ≤ 2n can be pumped down.
This means that L is finite if and only if there is a word x ∈ L such thatn ≤ |x | ≤ 2n. There are only finitely many such words, and so we can testall of them for membership of the language.
Sander Bruggink Automaten und Formale Sprachen 327
Context Free Languages Decidable Properties
Decidability
Undecidability for context free languages
The following problems are undecidable for context free languages, whichmeans that there are no procedures to solve them:
Equivalence problem: Let two context free languages L1, L2 be given.Does L1 = L2 hold?
Intersection problem: Let two context free languages L1, L2 be given.Does L1 ∩ L2 = ∅ hold?
Remark: In the lecture “Berechenbarkeit und Komplexitat” it will beshown, how we can prove that such problems are undecidable.
Sander Bruggink Automaten und Formale Sprachen 328
Context Free Languages Decidable Properties
Decidability
The intersection problem is decidable, when it known of one the twolanguages L1, L2 that it is regular.
Decision procedure:
1 Construct the push-down automaton that accepts L1 ∩ L2.
2 Convert it to a context free grammar.
3 Check whether the language generated by the context free grammar isempty.
Sander Bruggink Automaten und Formale Sprachen 329
Context Free Languages Deterministic Push-Down Automata
Deterministic Context Free Languages
We now consider a subclass of push-down automata that can be used torecognize languages deterministically and thus efficiently.
Deterministic Push-Down Automaton (definition)
A deterministic push-down automaton M is a 7-tupleM = (Z ,Σ, Γ, δ, z0,#,E ), where
(Z ,Σ, Γ, δ, z0,#) is a push-down automaton,
E ⊆ Z is the set of final states, and
the transition function δ is deterministic, which means: for all z ∈ Z ,a ∈ Σ and A ∈ Γ:
|δ(z , a,A)|+ |δ(z , ε,A)| ≤ 1.
Sander Bruggink Automaten und Formale Sprachen 330
Context Free Languages Deterministic Push-Down Automata
Deterministic Context Free Languages
Differences between push-down automata and deterministic push-downautomata:
Deterministic push-down automata have a set of final states andaccept in a final state – not with empty stack.
For each state z and each stack symbol A it holds that:
either there is at most one ε transitionor for each alphabet symbol there is at most one transition.
The definitions of configurations and transitions betweenconfiguration stay the same.
Sander Bruggink Automaten und Formale Sprachen 331
Context Free Languages Deterministic Push-Down Automata
Deterministic Context Free Languages
Accepted language of a deterministic push-down automaton (definition)
Let M = (Z ,Σ, Γ, δ, z0,#,E ) be a deterministic push-down automaton.The language accepted by M is:
N(M) = {x ∈ Σ∗ | (z0, x ,#) `∗ (z , ε, γ) fur ein z ∈ E , γ ∈ Γ∗}.
Sander Bruggink Automaten und Formale Sprachen 332
Context Free Languages Deterministic Push-Down Automata
Deterministic Context Free Languages
Deterministisch kontextfreie Sprachen
A language is deterministic context free if and only if it is accepted by adeterministic push-down automaton.
Sander Bruggink Automaten und Formale Sprachen 333
Context Free Languages Deterministic Push-Down Automata
Deterministic Context Free Languages
Example: The language L = {w$w R | w ∈ {a, b}∗} is deterministiccontext free.
M = ({z1, z2, z3}, {a, b, $}, {#,A,B}, δ, z1,#, {z3}),where δ is defined as follows (we write (z , a,A)→ (z ′, x), when(z ′, x) ∈ δ(z , a,A)).
(z1, a,#) → (z1,A#) (z1, a,A) → (z1,AA) (z1, a,B) → (z1,AB)(z1, b,#) → (z1,B#) (z1, b,A) → (z1,BA) (z1, b,B) → (z1,BB)(z1, $,#) → (z2,#) (z1, $,A) → (z2,A) (z1, $,B) → (z2,B)(z2, a,A) → (z2, ε) (z2, b,B) → (z2, ε) (z2, ε,#) → (z3,#)
Sander Bruggink Automaten und Formale Sprachen 334
Context Free Languages Deterministic Push-Down Automata
Deterministic Context Free Languages
Example 2: The language L = {ww R | w ∈ {a, b}∗} is however notdeterministic context free. (Without proof.)
We have already seen that this language is context free. That means, thatthe class of context free languages and the class of deterministic contextfree languages are not equal.
The class of deterministic context free languages is a strict subset of theclass of context free languages.
Sander Bruggink Automaten und Formale Sprachen 335
Context Free Languages Deterministic Push-Down Automata
Deterministic Context Free Languages
Further remarks:
Efficiency: With deterministic push-down automata we have obtaineda procedure for solving the word problem which runs in linear time (asfunction of the number of input symbols).
For this one has the push-down automata run one the word andchecks whether a final state was reached in the end.
Deterministic context free grammar: There are also classes ofgrammars which correpsond to the deterministic push-down automata.
These are, however, non-trivial; there are several forms. The mostwell-known is the class of so-called LR(k) grammars.
Sander Bruggink Automaten und Formale Sprachen 336
Context Free Languages Deterministic Push-Down Automata
Deterministic Context Free Languages
Deterministic context free languages have different closure properties thancontext free languages.
Abgeschlossenheit
Are the deterministic context free languages closed under the followingoperations?
Union (L1, L2 det. context free ⇒ L1 ∪ L2 det. context free) ?
Intersection (L1, L2 det. context free ⇒ L1 ∩ L2 det. context free) ?
Intersection with a regular language (L1 det. context free, L2 regular⇒ L1 ∩ L2 det. context free) ?
Complement (L det. context free ⇒ L = Σ∗\L det. context free) ?
Sander Bruggink Automaten und Formale Sprachen 337
Context Free Languages Deterministic Push-Down Automata
Deterministic Context Free Languages
Closure under complement
When L is a deterministic context free language, then also L = Σ∗ \ L isdeterministic context free.
Informal proof idea:Just like with DFA, we can construct a push-down automaton for thecomplement, by exchanging final and non-final states.
Sander Bruggink Automaten und Formale Sprachen 338
Context Free Languages Deterministic Push-Down Automata
Deterministic Context Free Languages
No closure under intersection
When L1 and L2 are deterministic context free languages, then L1 ∩ L2 isnot necessarily a deterministic context free language.
Proof: The example languages from the argument that context freelanguages are not closed under intersection are deterministic context free,however their intersection is not even context free. So we can use the samecounter example.
L1 = {ajbkck | j ≥ 0, k ≥ 0}L2 = {akbkc j | j ≥ 0, k ≥ 0}
L1 ∩ L2 = {anbncn | n ≥ 0}
Sander Bruggink Automaten und Formale Sprachen 339
Context Free Languages Deterministic Push-Down Automata
Deterministic Context Free Languages
No closure under union
When L1 and L2 are deterministic context free languages, then L1 ∪ L2 isnot necessarily a deterministic context free language.
Proof: From complement under union it would follow, that thedeterministic context free languages are closed under intersection, because
of the following equality: L1 ∩ L2 = L1 ∪ L2).
Sander Bruggink Automaten und Formale Sprachen 340
Context Free Languages Deterministic Push-Down Automata
Deterministic Context Free Languages
Deterministic context free languages are closed under intersection withregular languages.
Closure under intersection with regular languages
Let L be a deterministic context free language and R a regular language.Then L ∩ R is a deterministic context free language.
Proof idea: Analogous to context free languages.
Sander Bruggink Automaten und Formale Sprachen 341
Context Free Languages Deterministic Push-Down Automata
Deterministic Context Free Languages
Summary closure properties
Closedunder R
egu
lar
Det
.C
F
CF
Union 3 7 3
Concatenation 3 7 3
Kleene-Star 3 7 3
Intersection 3 7 7
Inters. w. RL 3 3 3
Complement 3 3 7
Sander Bruggink Automaten und Formale Sprachen 342
Context Free Languages Deterministic Push-Down Automata
Decidability
Decidability with deterministic context free languages
The following problem are decidable for deterministic context freelanguages (represented by a deterministic push-down automaton):
Word problem: Let a deterministic context free language and a wordw ∈ Σ∗ be given. Does w ∈ L hold?
With a deterministic push-down automaton in linear time (as functionof the length of the input).
Emptiness problem: Let a context free language L be given. DoesL = ∅ hold?
See the corresponding procedure for context free languages.
Sander Bruggink Automaten und Formale Sprachen 343
Context Free Languages Deterministic Push-Down Automata
Decidability
Decidability with deterministic context free languages
Finiteness problem: Let a context free language L be given. Is L finite?
See the corresponding procedure for context free languages.
Equivalence problem: Let two context free languages L1, L2 be given.Does L1 = L2 hold?
Was an open problem for a long time. Only in 1997 it was shown byGeraud Senizergues.
Sander Bruggink Automaten und Formale Sprachen 344
Context Free Languages Deterministic Push-Down Automata
Decidability
Undecidability with deterministic context free languages
The following problem are not decidable for deterministic context freelanguages.
Intersection problem: Let two context free languages L1, L2 be given.Does L1 ∩ L2 = ∅ hold?
As with context free languages, this problem is decidable when one thetwo languages is regular.
Sander Bruggink Automaten und Formale Sprachen 345
Context Free Languages Context Free Languages in Praxis
Parsing with ANTLR
Context free languages are often used to specify the syntax ofcomputer languages (programming languages, markup languages,etc.).
A parser is a program module that reads the source code of a programand produces some representation of it (for example a syntax tree).
In this lecture we look at ANTLR v4 (ANother Tool for LanguageRecognition) a parser generator which can be used to generate aparser (in Java)
Sander Bruggink Automaten und Formale Sprachen 346
Context Free Languages Context Free Languages in Praxis
Parsing with ANTLR
In general, the parser of a compiler or an interpreter consists of twocomponenents:
The lexical analyser) arranges symbols into strings, the so-calledtokens.
The parser itself analyses the sequence of tokens and constructs asyntax tree (or another representation).
ANTLR generates both components.
Sander Bruggink Automaten und Formale Sprachen 347
Context Free Languages Context Free Languages in Praxis
Parsing in Practice
(8 + sqrt(16)) * 3
( 8 + sqrt ( 16 ) ) * 3
〈expr〉〈expr〉
( 〈expr〉〈expr〉
8
+ 〈expr〉sqrt ( 〈expr〉
16
)
)
* 〈expr〉3
Input:
Tokens:
Syntax tree:
Lexical analysis
Parsing
Further processing
Sander Bruggink Automaten und Formale Sprachen 348
Context Free Languages Context Free Languages in Praxis
Interlude: Extended Backus-Naur-Form
In practice the Extended Backus-Naur-Form (EBNF) is often used tospecify context free grammars. In EBNF the right sides of the productionsare not words, but regular expressions.
Sander Bruggink Automaten und Formale Sprachen 349
Context Free Languages Context Free Languages in Praxis
Einschub: Extended Backus-Naur-Form
EBNF grammars can be easily transformed into normal context freegrammars.
A→ α(β1 | · · · | βn)γ ⇒{
A→ αBγ
B → β1 | · · · | βn
A→ αβ∗γ ⇒{
A→ αγ | αBγ
B → β | βB
Abbreviations: α+ ≡ αα∗, α? ≡ (ε | α), usw.
Apply the above rules until the grammar does not contain “∗” and(nested) “|” operations anymore.
Sander Bruggink Automaten und Formale Sprachen 350
Context Free Languages Context Free Languages in Praxis
Parsing with ANTLR
We will generate a parser which recognizes the following grammar:
program→ statement∗
statement → ID ‘=’ expression ‘ ←↩ ’ | expression ‘ ←↩ ’ | ‘ ←↩ ’
expression→ expression ‘*’ expression | expression ‘/’ expression
| expression ‘+’ expression | expression ‘-’ expression
| ‘sqrt’ ‘(’ expression ‘)’
| ‘(’ expression ‘)’
| ID | NUMBER
Terminal symbols written in quotes. The tokens ID and NUMBER arerecognized by the lexical analyser.ANTLR solves ambiguities “automatically”.
Sander Bruggink Automaten und Formale Sprachen 351
Context Free Languages Context Free Languages in Praxis
Parsing with ANTLR
Example input:
a = 4 + 2 * 5
b = (a / 2) * sqrt(16)
a + b
Sander Bruggink Automaten und Formale Sprachen 352
Context Free Languages Context Free Languages in Praxis
Parsing with ANTLR
Body of an ANTLR source file:
grammar 〈parser name〉 ;Rules of the lexical analysers
Rules of the grammar
Sander Bruggink Automaten und Formale Sprachen 353
Context Free Languages Context Free Languages in Praxis
Parsing with ANTLR
The lexical analyser:
// lexical rules
NUMBER : [0-9]+ | [0-9]* ’.’ [0-9]+ ;
NEWLINE : ’\r’? ’\n’ ;
SQRT : [sS][qQ][rR][tT] ;
ID : [a-zA-Z] [a-zA-Z0-9]* ;
WHITESPACE : [ \t]+ -> skip ;
Variables of the lexical analyser start with a capital letter.
The command “-> skip” makes sure that white space characters are nottransferred to the parser.
”sqrt“ is a SQRT-token, although it is also recognized by ID.
Sander Bruggink Automaten und Formale Sprachen 354
Context Free Languages Context Free Languages in Praxis
Parsing mit ANTLR
Translation of the rulestatement → ID ‘=’ expression ‘ ←↩ ’ | expression ‘ ←↩ ’ | ‘ ←↩ ’
statement : var=ID ’=’ expression NEWLINE # Assignment
| expression NEWLINE # PrintExpression
| NEWLINE # Empty
;
Name of subtree Name of an alternative
Variables of the parser start with a non-capital letter.
Sander Bruggink Automaten und Formale Sprachen 355
Context Free Languages Context Free Languages in Praxis
Parsing with ANTLR
ANTLR tools:
antlr4: Reads a grammar and generates a lexical analyser and aparsers (and Java-classes which are used to further process the syntaxtree generated by the parser).
grun: Program which call a parser and show the syntax tree (mainlyused for debugging the grammar).
Sander Bruggink Automaten und Formale Sprachen 356
Context Free Languages Context Free Languages in Praxis
Parsing with ANTLR
Now, we want to further process the syntax tree, and execute theprogram.
ANTLR generated some classes that help us do this.
We write an implementation of ExpressionVisitor<T> to walkthrough the syntax tree and calculate the results.
For the grammar Expression.g, alternative X and return type T :
public T visitX (ExpressionParser.X Context ctx)
{T value = ...;
// Visit subtrees
return value;}
Sander Bruggink Automaten und Formale Sprachen 357
Context Free Languages Context Free Languages in Praxis
Example for:
expression : ...
| left=expression ’+’ right=expression # Plus
| ...
public Double visitPlus(ExpressionParser.PlusContext ctx)
{return visit(ctx.left) + visit(ctx.right);
}
Sander Bruggink Automaten und Formale Sprachen 358
Context Free Languages Context Free Languages in Praxis
Parsing mit ANTLR
For self study: http://www.antlr.org/.
Sander Bruggink Automaten und Formale Sprachen 359
Conclusion
All languages
Semi-decidable lang. (0)
Context sensitive lang. (1)
Context freelanguages (2)
Det. context freelanguages
Regular languages (3)
Context freelanguages (2)
Det. context freelanguage
Regular languages (3)
Solving the word problem
Pumping Lemma for RL
Regular grammars
DFAs and NFAsRegular expression
Myhill–Nerode equivalence
Pumping Lemma for CFL
Context free grammars
Push-down automata
Det. push-down automata
Sander Bruggink Automaten und Formale Sprachen 360
Conclusion
Summary
Closedunder R
egu
lar.
Det
.C
F
Con
text
free
Union 3 7 3
Concatenation 3 7 3
Kleene-Star 3 7 3
Intersection 3 7 7
Inters. with regular 3 3 3
Komplement 3 3 7
Problemdecidable R
egu
lare
Det
.C
F
Con
text
free
Word problem 3 3 3
Emptiness 3 3 3
Finiteness 3 3 3
Intersection pr. 3 7 7
— with reg. 3 3 3
Equivalence 3 3 7
Sander Bruggink Automaten und Formale Sprachen 361
Conclusion
Ubersicht
Applications:
Regulare Languages
VerificationSearching and replacing in text editorsLexical analysis
Context free languages
Specification of computer languages (programming languages, HTML,XML, arithmetic expressions, . . . )Specification of natural languagesVerification – the stack is used to model function calls
Sander Bruggink Automaten und Formale Sprachen 362
Conclusion
Outlook “Berechenbarkeit und Komplexitat”
Decidability
the focus is on context sensitive languages, semi-decidable languagesand undecidable languages
automaton model: Turing machine
Solving the word problem ⇐⇒ Solving a decision problem ⇐⇒Computing characteristic function
Undecidable problem / Uncomputable functions
Complexity:
Complexity of algorithms ⇒ Complexity classes
Sander Bruggink Automaten und Formale Sprachen 363