languages and machines unit two: regular languages and finite state automata

28
Languages and Machines Unit two: Regular languages and Finite State Automata

Post on 22-Dec-2015

234 views

Category:

Documents


0 download

TRANSCRIPT

Languages and Machines

Unit two: Regular languages and Finite State Automata

2

Review of week one

• A language is a set of strings (the set of different things you can say). May be infinite.

• A string is a sequence of symbols. Minimum length zero, maximum length some finite number.

• A symbol is just some mark on the page or screen. A language has a finite alphabet of symbols.

3

Review of week one

• In a context-dependent language, the meaning of a phrase depends on the context

• In a context-sensitive language, the structure of a phrase depends on the context

• Most natural languages are context-dependent but not context-sensitive

• A context-free language is one where the structure of a phrase is always the same, independent of context

• A regular language is a context-free language which has simple rules for forming valid strings (e.g. "94", "getWidth()“)

4

Classes of formal language

regular

phrase structure

context-freecontext-sensitive

5

Regular languages

• Here are examples of strings from a regular language with alphabet {a,b}:• • a• b• ab• aaaaa• ababab

6

Regular languages

1. the empty set is a regular language2. the set consisting of the empty string ()

is a regular language3. the set consisting of a one-symbol string

is a regular language4. a new regular language can be made by

taking a string from a regular language and concatenating it with a string from a regular language

5. a new regular language can be made by taking the disjoint union of two regular languages

7

Recognizing regular languages

• regular languages can be recognized and interpreted by a finite-state machine

• for example, here is a machine to recognize a two-bit string:

0

1

0

1

acceptor states

8

Regular expressions

Wouldn’t it be nice if we had a compact way of specifying a regular language?

• we have!

• it’s a special notation called a regular expression

9

Examples of regular languages

1. the set of all two-symbol strings containing the letters a and b(a|b)2

2. the set of all two-bit strings(0|1)2

3. the set of all possible words(a|..|z)+

4. the set of all decimal integers(0|(1|..|9)(0|..|9)*)

5. the set of Java identifiersJavaLetter JavaLetterOrDigit*

10

More examples of regular languages

1. all the possible three-bit strings(0|1)3

2. all the single-digit decimal numbers(0|1|2|3|4|5|6|7|8|9) (0|..|9)

3. all the possible repetitions of the traffic-light sequence (red, amber, green, amber)(red amber green amber)*

11

ActivityWrite down the regular expression denoting the following regular languages:

• The language with two strings “the cat” and “the mat”

• Arithmetic expressions with two operands, e.g. 1 + 2, 3 × 4The allowed operator are: +, -, ×, ÷The allowed operands are: single digit decimal numbers

• The language consisting of all possible binary strings

• The language of HTML tags such as <HEAD>

12

Suggested Answers

• The language with two strings “the cat” and “the mat”the (cat | mat) or (the (c|m)at)

• Arithmetic expressions with two operands, e.g. 1 + 2, 3 × 4.(0|..|9) (+|-|×|÷) (0|..|9)

• The language consisting of all possible binary strings(0|1)*

• The language of HTML tags such as <HEAD>< (A|..|Z)+ >

13

A cautionary note

• You have been using a metalanguage!

• The regular expression strings form a language having terminal symbols ( ) + * | plus literal symbols e.g. a stands for the letter a

• this can cause problems when the metalanguage and the language get confused e.g. the language consisting of strings of one to three vertical bars:

| | || | |||

14

A cautionary note

• we can fix this by some ghastly escape convention, e.g. convert the above to

"|" | "||" | "|||"

• now we have problems with the quote symbol!

• the best idea is to choose metalanguage symbols which are rarely encountered in the language being described, and use bold-face or color to distinguish

15

Regular languages and regular expressions

Regular Language1. the empty set2. the set consisting of the

empty string ()3. the set consisting of a

one-symbol string (e.g. "a")

4. a new regular language can be made by taking a string from a regular language and concatenating it with a string from a regular language

5. a new regular language can be made by taking the union of two regular languages

Regular Expression1. 2.

3. a

4. a b

5. a | b

16

Regular languages and regular expressions

The other ways of forming regular expressions are just shorthand:

a0 =a1 = aa2 = aaa* = | a | aa | aaa | ...

a+ = a | aa | aaa | ...

17

Regular languages and regular expressions

• Brackets are used to show precedence of the operations

(a | b )* a | b*

• default precedence is: * or + or n

concatenation |

18

Activity

Give examples of the following languages:

1. (x | y | z)3

2. x | y | z*

3. a b2

4. (a b)2

19

Suggested Answers

Give examples of the following languages:

1. (x | y | z)3 xzy2. x | y | z*

3. a b2 abb4. (a b)2 abab

20

From Regular Expressions to Finite State Automata

1. It is an amazing fact that any regular expression has an equivalent finite state automaton which recognizes it

2. and every finite state automaton recognizes some regular expression

• we will prove these propositions later

21

01E

D00

Finite State Machines

• an FSM to add two binary numbers

A

B

C

F

0

1

0

0

1

1

10

start state

transition

input symbol

end state

output symbol

22

Finite state automata

• These are simple machines with no output symbols

• they can only recognize strings of input symbols

• acceptance is shown by a special state

23

NFAs

• The kind of finite state automata we shall be using are called nondeterministic finite automata

• "nondeterministic" means we can do naughty things like:• have a transition without a symbol• label two exit transitions with the same symbol• not show the paths which lead to failure

24

Example of an NFA

• what regular language does this NFA represent?a b | a b c | a+

a

a

a

a

b

b c

25

a

Examples of conversion from REs to NFAs

• (a b)2

• a b2

• (a | b)2

• (a | b)*

a b a b

a b b

b

a

b

a

b

26

Convert the following regular expressions to NFAs:

1. JavaLetter JavaLetterOrDigit*

2. (red amber green amber)*

Convert the following NFAs to REs:

3.

4.

Activity

a b

a

b

c

d

27

Suggested answer

1.

2.

3. (ab)*

4. (ac|bd)+

javaLetter

javaLetterOrDigit

red amber green amber

28

Summary

• regular expressions give us a neat notation for describing regular languages

• nondeterministic finite automata (NFAs) provide a diagrammatic version of regular expressions

• these notations are equivalent• finite automata theory is crucial in

generating lexical analyzers from regular expressions