languages and machines unit two: regular languages and finite state automata
Post on 22-Dec-2015
234 views
TRANSCRIPT
2
Review of week one
• A language is a set of strings (the set of different things you can say). May be infinite.
• A string is a sequence of symbols. Minimum length zero, maximum length some finite number.
• A symbol is just some mark on the page or screen. A language has a finite alphabet of symbols.
3
Review of week one
• In a context-dependent language, the meaning of a phrase depends on the context
• In a context-sensitive language, the structure of a phrase depends on the context
• Most natural languages are context-dependent but not context-sensitive
• A context-free language is one where the structure of a phrase is always the same, independent of context
• A regular language is a context-free language which has simple rules for forming valid strings (e.g. "94", "getWidth()“)
5
Regular languages
• Here are examples of strings from a regular language with alphabet {a,b}:• • a• b• ab• aaaaa• ababab
6
Regular languages
1. the empty set is a regular language2. the set consisting of the empty string ()
is a regular language3. the set consisting of a one-symbol string
is a regular language4. a new regular language can be made by
taking a string from a regular language and concatenating it with a string from a regular language
5. a new regular language can be made by taking the disjoint union of two regular languages
7
Recognizing regular languages
• regular languages can be recognized and interpreted by a finite-state machine
• for example, here is a machine to recognize a two-bit string:
0
1
0
1
acceptor states
8
Regular expressions
Wouldn’t it be nice if we had a compact way of specifying a regular language?
• we have!
• it’s a special notation called a regular expression
9
Examples of regular languages
1. the set of all two-symbol strings containing the letters a and b(a|b)2
2. the set of all two-bit strings(0|1)2
3. the set of all possible words(a|..|z)+
4. the set of all decimal integers(0|(1|..|9)(0|..|9)*)
5. the set of Java identifiersJavaLetter JavaLetterOrDigit*
10
More examples of regular languages
1. all the possible three-bit strings(0|1)3
2. all the single-digit decimal numbers(0|1|2|3|4|5|6|7|8|9) (0|..|9)
3. all the possible repetitions of the traffic-light sequence (red, amber, green, amber)(red amber green amber)*
11
ActivityWrite down the regular expression denoting the following regular languages:
• The language with two strings “the cat” and “the mat”
• Arithmetic expressions with two operands, e.g. 1 + 2, 3 × 4The allowed operator are: +, -, ×, ÷The allowed operands are: single digit decimal numbers
• The language consisting of all possible binary strings
• The language of HTML tags such as <HEAD>
12
Suggested Answers
• The language with two strings “the cat” and “the mat”the (cat | mat) or (the (c|m)at)
• Arithmetic expressions with two operands, e.g. 1 + 2, 3 × 4.(0|..|9) (+|-|×|÷) (0|..|9)
• The language consisting of all possible binary strings(0|1)*
• The language of HTML tags such as <HEAD>< (A|..|Z)+ >
13
A cautionary note
• You have been using a metalanguage!
• The regular expression strings form a language having terminal symbols ( ) + * | plus literal symbols e.g. a stands for the letter a
• this can cause problems when the metalanguage and the language get confused e.g. the language consisting of strings of one to three vertical bars:
| | || | |||
14
A cautionary note
• we can fix this by some ghastly escape convention, e.g. convert the above to
"|" | "||" | "|||"
• now we have problems with the quote symbol!
• the best idea is to choose metalanguage symbols which are rarely encountered in the language being described, and use bold-face or color to distinguish
15
Regular languages and regular expressions
Regular Language1. the empty set2. the set consisting of the
empty string ()3. the set consisting of a
one-symbol string (e.g. "a")
4. a new regular language can be made by taking a string from a regular language and concatenating it with a string from a regular language
5. a new regular language can be made by taking the union of two regular languages
Regular Expression1. 2.
3. a
4. a b
5. a | b
16
Regular languages and regular expressions
The other ways of forming regular expressions are just shorthand:
a0 =a1 = aa2 = aaa* = | a | aa | aaa | ...
a+ = a | aa | aaa | ...
17
Regular languages and regular expressions
• Brackets are used to show precedence of the operations
(a | b )* a | b*
• default precedence is: * or + or n
concatenation |
18
Activity
Give examples of the following languages:
1. (x | y | z)3
2. x | y | z*
3. a b2
4. (a b)2
19
Suggested Answers
Give examples of the following languages:
1. (x | y | z)3 xzy2. x | y | z*
3. a b2 abb4. (a b)2 abab
20
From Regular Expressions to Finite State Automata
1. It is an amazing fact that any regular expression has an equivalent finite state automaton which recognizes it
2. and every finite state automaton recognizes some regular expression
• we will prove these propositions later
21
01E
D00
Finite State Machines
• an FSM to add two binary numbers
A
B
C
F
0
1
0
0
1
1
10
start state
transition
input symbol
end state
output symbol
22
Finite state automata
• These are simple machines with no output symbols
• they can only recognize strings of input symbols
• acceptance is shown by a special state
23
NFAs
• The kind of finite state automata we shall be using are called nondeterministic finite automata
• "nondeterministic" means we can do naughty things like:• have a transition without a symbol• label two exit transitions with the same symbol• not show the paths which lead to failure
25
a
Examples of conversion from REs to NFAs
• (a b)2
• a b2
• (a | b)2
• (a | b)*
a b a b
a b b
b
a
b
a
b
26
Convert the following regular expressions to NFAs:
1. JavaLetter JavaLetterOrDigit*
2. (red amber green amber)*
Convert the following NFAs to REs:
3.
4.
Activity
a b
a
b
c
d
28
Summary
• regular expressions give us a neat notation for describing regular languages
• nondeterministic finite automata (NFAs) provide a diagrammatic version of regular expressions
• these notations are equivalent• finite automata theory is crucial in
generating lexical analyzers from regular expressions