lexical analysis - keithschwarz.comwhy do lexical analysis? dramatically simplify parsing. eliminate...

216
Lexical Analysis

Upload: others

Post on 17-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Lexical Analysis

Page 2: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Announcements

● Programming Assignment 1 Out● Due Friday, July 1 at 11:59 PM.

● Check the website for readings:● Decaf Specification● Lexical Analysis● Intro to flex● Programming Assignment 1

Page 3: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Where We Are

Lexical Analysis

Syntax Analysis

Semantic Analysis

IR Generation

IR Optimization

Code Generation

Optimization

SourceCode

MachineCode

Page 4: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

while (ip < z) ++ip;

Page 5: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

w h i l e ( i < z ) \n \t + i p ;

while (ip < z) ++ip;

p + +

Page 6: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

w h i l e ( i < z ) \n \t + i p ;

while (ip < z) ++ip;

p + +

T_While ( T_Ident < T_Ident ) ++ T_Ident

ip z ip

Page 7: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

w h i l e ( i < z ) \n \t + i p ;

while (ip < z) ++ip;

p + +

T_While ( T_Ident < T_Ident ) ++ T_Ident

ip z ip

While

++

Ident

<

Ident Ident

ip z ip

Page 8: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

do[for] = new 0;

Page 9: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

do[for] = new 0;

d o [ f ] n e w 0 ;=o r

Page 10: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

do[for] = new 0;

T_Do [ T_For T_New T_IntConst

0

d o [ f ] n e w 0 ;=o r

] =

Page 11: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

do[for] = new 0;

T_Do [ T_For T_New T_IntConst

0

d o [ f ] n e w 0 ;=o r

] =

Page 12: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Why do Lexical Analysis?

● Dramatically simplify parsing.● Eliminate whitespace.● Eliminate comments.● Convert data early.

● Separate out logic to read source files.● Potentially an issue on multiple platforms.● Can optimize reading code independently of parser.● An extreme example...

Page 13: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

I am having some difficulty compiling a C++ program that I've written.

This program is very simple and, to the best of my knowledge, conforms to all the rules set forth in the C++ Standard. [...]

The program is as follows:

Source: http://stackoverflow.com/questions/5508110/why-is-this-program-erroneously-rejected-by-three-c-compilers

Page 14: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

I am having some difficulty compiling a C++ program that I've written.

This program is very simple and, to the best of my knowledge, conforms to all the rules set forth in the C++ Standard. [...]

The program is as follows:

Source: http://stackoverflow.com/questions/5508110/why-is-this-program-erroneously-rejected-by-three-c-compilers

Page 15: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

I am having some difficulty compiling a C++ program that I've written.

This program is very simple and, to the best of my knowledge, conforms to all the rules set forth in the C++ Standard. [...]

The program is as follows:

Source: http://stackoverflow.com/questions/5508110/why-is-this-program-erroneously-rejected-by-three-c-compilers

c:\dev>g++ helloworld.pnghelloworld.png: file not recognized: File format not recognizedcollect2: ld returned 1 exit status

Page 16: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Goals of Lexical Analysis

● Convert from physical description of a program into sequence of of tokens.

● Each token is associated with a lexeme.● Each token may have optional attributes.● The token stream will be used in the parser to

recover the program structure.

Page 17: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Challenges in Lexical Analysis

● How to partition the program into lexemes?● How to label each lexeme correctly?

Page 18: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Not-so-Great Moments in Scanning

● FORTRAN: Whitespace is irrelevant

DO 5 I = 1,25

DO 5 I = 1.25

Thanks to Prof. Alex Aiken

Page 19: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Not-so-Great Moments in Scanning

● FORTRAN: Whitespace is irrelevant

DO 5 I = 1,25

DO5I = 1.25

Thanks to Prof. Alex Aiken

Page 20: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Not-so-Great Moments in Scanning

● FORTRAN: Whitespace is irrelevant

DO 5 I = 1,25

DO5I = 1.25

● Can be difficult to tell when to partition input.

Thanks to Prof. Alex Aiken

Page 21: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Not-so-Great Moments in Scanning

● C++: Nested template declarations

vector<vector<int>> myVector

Thanks to Prof. Alex Aiken

Page 22: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Not-so-Great Moments in Scanning

● C++: Nested template declarations

vector < vector < int >> myVector

Thanks to Prof. Alex Aiken

Page 23: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Not-so-Great Moments in Scanning

● C++: Nested template declarations

(vector < (vector < (int >> myVector)))

Thanks to Prof. Alex Aiken

Page 24: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Not-so-Great Moments in Scanning

● C++: Nested template declarations

(vector < (vector < (int >> myVector)))

● Again, can be difficult to determine where to split.

Thanks to Prof. Alex Aiken

Page 25: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Not-so-Great Moments in Scanning

● C++: Types nested in templates

Page 26: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Not-so-Great Moments in Scanning

● C++: Types nested in templates

template <typename T>

void MyFunction(T val) {

T::iterator itr;

}

Page 27: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Not-so-Great Moments in Scanning

● C++: Types nested in templates

template <typename T>

void MyFunction(T val) {

typename T::iterator itr;

}

Page 28: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Not-so-Great Moments in Scanning

● C++: Types nested in templates

template <typename T>

void MyFunction(T val) {

typename T::iterator itr;

}

● Can be hard to label tokens correctly.

Page 29: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Not-so-Great Moments in Scanning

● PL/1: Keywords can be used as identifiers.

Thanks to Prof. Alex Aiken

Page 30: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Not-so-Great Moments in Scanning

● PL/1: Keywords can be used as identifiers.

IF THEN THEN THEN = ELSE; ELSE ELSE = IF

Thanks to Prof. Alex Aiken

Page 31: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Not-so-Great Moments in Scanning

● PL/1: Keywords can be used as identifiers.

IF THEN THEN THEN = ELSE; ELSE ELSE = IF

● Can be difficult to determine how to label lexemes.

Thanks to Prof. Alex Aiken

Page 32: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Defining a Lexical Analysis

● Define a set of tokens.● Define the set of lexemes associated with each

token.● Define an algorithm for resolving conflicts that

arise in the lexemes.

Page 33: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Choosing Tokens

Page 34: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

What Tokens are Useful Here?

for (int k = 0; k < myArray[5]; ++k) { cout << k << endl;}

Page 35: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

What Tokens are Useful Here?

for (int k = 0; k < myArray[5]; ++k) { cout << k << endl;}

T_For {T_IntConstant }T_Int <<T_Identifier ;= <( [) ]++

Page 36: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Choosing Good Tokens

● Very much dependent on the language.● Typically:

● Give keywords their own tokens.● Give different punctuation symbols their own

tokens.● Discard irrelevant information (whitespace,

comments)

Page 37: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Defining Sets of Strings

Page 38: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

String Terminology

● An alphabet Σ is a set of characters.● A string over Σ is a finite sequence of elements

from Σ.● Example:

● Σ = {☺, ☼ }● Valid strings include ☺, ☺☺, ☺☼, ☺☺☼, etc.

● The empty string of no characters is denoted ε.

Page 39: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Languages

● A language is a set of strings.● Example: Even numbers

● Σ = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}● L = {0, 2, 4, 6, 8, 10, 12, 14, … }

● Example: C variable names● Σ = ASCII characters● L = {a, b, c, …, A, B, C, …, _, aa, ab, … }

● Example: L = {static}

Page 40: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Regular Languages

● A subset of all languages that can be defined by regular expressions.

● Small subset of all languages, but surprisingly powerful in practice:● Capture a useful class of languages.● Can be implemented efficiently (more on that later).

Page 41: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Regular Expressions

● Any character is a regular expression matching itself.● ε is a regular expression matching the empty string.

● If R1 and R

2 are regular expressions, then

● R1R

2 is a regular expression matching the concatenation of

the languages.

● R1 | R

2 is a regular expression matching the disjunction of the

languages.

● R1* is a regular expression matching the Kleene closure of the

language.● (R) is a regular expression matching R.

Page 42: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Regular Expressions In Action

● Example: Even Numbers

(+|-|ε) (0|1|2|3|4|5|6|7|8|9)* (0|2|4|6|8)

Page 43: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Regular Expressions In Action

● Example: Even Numbers

(+|-|ε) (0|1|2|3|4|5|6|7|8|9)* (0|2|4|6|8)

● Notational simplification: Definitions

Sign = + | -

OptSign = Sign | ε

Digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

EvenDigit = 0 | 2 | 4 | 6 | 8

EvenNumber = OptSign Digit* EvenDigit

Page 44: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Regular Expressions In Action

● Example: Even Numbers

(+|-|ε) (0|1|2|3|4|5|6|7|8|9)* (0|2|4|6|8)

● Notational simplification: Multi-Way Disjunction

Sign = + | -

OptSign = Sign | ε

Digit = [0123456789]

EvenDigit = [02468]

EvenNumber = OptSign Digit* EvenDigit

Page 45: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Regular Expressions In Action

● Example: Even Numbers

(+|-|ε) (0|1|2|3|4|5|6|7|8|9)* (0|2|4|6|8)

● Notational simplification: Ranges

Sign = + | -

OptSign = Sign | ε

Digit = [0-9]

EvenDigit = [02468]

EvenNumber = OptSign Digit* EvenDigit

Page 46: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Regular Expressions In Action

● Example: Even Numbers

(+|-|ε) (0|1|2|3|4|5|6|7|8|9)* (0|2|4|6|8)

● Notational simplification: Zero-or-One

Sign = + | -

OptSign = Sign?

Digit = [0-9]

EvenDigit = [02468]

EvenNumber = OptSign Digit* EvenDigit

Page 47: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Implementing Regular Expressions

● Regular expressions can be implemented using finite automata.

● There are two kinds of finite automata:● NFAs (nondeterministic finite automata), which

we'll see in a second, and● DFAs (deterministic finite automata), which we'll

see later.

● Automata are best explained by example...

Page 48: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A Simple NFA

r o f l

l

o l

start

Page 49: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A Simple NFA

r o f l

l

o l

start

l o l

Page 50: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A Simple NFA

r o f l

l

o l

start

l o l

Page 51: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A Simple NFA

r o f l

l

o l

start

l o l

Page 52: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A Simple NFA

r o f l

l

o l

start

l o l

Page 53: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A Simple NFA

r o f l

l

o l

start

l o l

Page 54: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A Simple NFA

r o f l

l

o l

start

l o l

Page 55: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A Simple NFA

r o f l

l

o l

start

l o l

Page 56: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A Simple NFA

r o f l

l

o l

start

l o l

Page 57: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A Simple NFA

r o f l

l

o l

start

l o l

Page 58: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A Simple NFA

r o f l

l

o l

start

Page 59: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A Simple NFA

r o f l

l

o l

start

l o g

Page 60: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A Simple NFA

r o f l

l

o l

start

l o g

Page 61: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A Simple NFA

r o f l

l

o l

start

l o g

Page 62: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A Simple NFA

r o f l

l

o l

start

l o g

Page 63: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A Simple NFA

r o f l

l

o l

start

l o g

Page 64: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A Simple NFA

r o f l

l

o l

start

l o g

Page 65: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A Simple NFA

r o f l

l

o l

start

l o g

Page 66: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A Simple NFA

r o f l

l

o l

start

l o g

Page 67: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A Simple NFA

r o f l

l

o l

start

l o g

Page 68: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A More Complex NFA

start

ε

ε

ε

a

s

s s

t e

Σ

Page 69: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A More Complex NFA

start

ε

ε

ε

a

s

s s

t e

s k a t e s

Σ

Page 70: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A More Complex NFA

start

ε

ε

ε

a

s

s s

t e

s k a t e s

Σ

Page 71: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A More Complex NFA

start

ε

ε

ε

a

s

s s

t e

s k a t e s

Σ

Page 72: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A More Complex NFA

start

ε

ε

ε

a

s

s s

t e

s k a t e s

Σ

Page 73: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A More Complex NFA

start

ε

ε

ε

a

s

s s

t e

s k a t e s

Σ

Page 74: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A More Complex NFA

start

ε

ε

ε

a

s

s s

t e

s k a t e s

Σ

Page 75: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A More Complex NFA

start

ε

ε

ε

a

s

s s

t e

s k a t e s

Σ

Page 76: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A More Complex NFA

start

ε

ε

ε

a

s

s s

t e

s k a t e s

Σ

Page 77: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A More Complex NFA

start

ε

ε

ε

a

s

s s

t e

s k a t e s

Σ

Page 78: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A More Complex NFA

start

ε

ε

ε

a

s

s s

t e

s k a t e s

Σ

Page 79: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A More Complex NFA

start

ε

ε

ε

a

s

s s

t e

s k a t e s

Σ

Page 80: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A More Complex NFA

start

ε

ε

ε

a

s

s s

t e

s k a t e s

Σ

Page 81: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A More Complex NFA

start

ε

ε

ε

a

s

s s

t e

s k a t e s

Σ

Page 82: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A More Complex NFA

start

ε

ε

ε

a

s

s s

t e

s k a t e s

Σ

Page 83: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A More Complex NFA

start

ε

ε

ε

a

s

s s

t e

s k a t e s

Σ

Page 84: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Simulating an NFA

● Keep track of a set of states, initially the start state and everything reachable by ε-moves.

● For each character in the input:● Maintain a set of next states, initially empty.● For each current state:

– Follow all transitions labeled with the current letter.

– Add these states to the set of new states.

● Add every state reachable by an ε-move to the set of next states.

● Accept if at least one state in the set of states is accepting.

● Complexity: O(|S||Q|2) for strings of length |S| and |Q| states.

Page 85: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

From Regular Expressions to NFAs

● There is a (beautiful!) procedure from converting a regular expression to an NFA.

● Associate each regular expression with an NFA with the following properties:● There is exactly one accepting state.● There are no transitions out of the accepting state.● There are no transitions into the starting state.

start

Page 86: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Base Cases

εstart

Automaton for ε

astart

Automaton for single character a

Page 87: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Construction for R1R

2

Page 88: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

R1

Construction for R1R

2

R2

start start

Page 89: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

R1

Construction for R1R

2

R2

start

Page 90: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

R1

Construction for R1R

2

R2

start ε

Page 91: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

R1

Construction for R1R

2

R2

start ε

Page 92: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Construction for R1 | R

2

Page 93: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Construction for R1 | R

2

R2

start

R1

start

Page 94: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Construction for R1 | R

2

R2

start

R1

start

start

Page 95: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Construction for R1 | R

2

R2

R1start

ε

ε

Page 96: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Construction for R1 | R

2

R2

R1start

ε

ε

Page 97: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Construction for R1 | R

2

R2

R1start

ε

ε

ε

ε

Page 98: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Construction for R1 | R

2

R2

R1start

ε

ε

ε

ε

Page 99: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Construction for R*

Page 100: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Construction for R*

R

start

Page 101: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Construction for R*

R

start start

Page 102: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Construction for R*

R

start start

ε

Page 103: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Construction for R*

R

start

ε

ε ε

Page 104: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Construction for R*

R

start

ε

ε ε

ε

Page 105: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Construction for R*

R

start

ε

ε ε

ε

Page 106: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Overall Result

● Any regular expression of length n can be converted into an NFA with O(n) states.

● Can determine whether a string matches a regular expression in O(|S|n2), where |S| is the length of the string.

● We'll see how to make this O(|S|) later.

Page 107: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

The Story so Far

● We have a way of describing sets of strings for each token using regular expressions.

● We have an implementation of regular expressions using NFAs.

● How do we cut apart the input text?

Page 108: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Lexing Ambiguities

T_For forT_Identifier [A-Za-z_][A-Za-z0-9_]*

Page 109: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Lexing Ambiguities

T_For forT_Identifier [A-Za-z_][A-Za-z0-9_]*

f o tr

Page 110: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Lexing Ambiguities

T_For forT_Identifier [A-Za-z_][A-Za-z0-9_]*

f o tr

f o trf o trf o trf o trf o tr

f o trf o trf o trf o tr

Page 111: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Conflict Resolution

● Assume all tokens are specified as regular expressions.

● Algorithm: Left-to-right scan.● Tiebreaking rule one: Maximal munch.

● Always match the longest possible prefix of the remaining text.

Page 112: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Lexing Ambiguities

T_For forT_Identifier [A-Za-z_][A-Za-z0-9_]*

f o tr

f o trf o trf o trf o trf o tr

f o trf o trf o trf o tr

Page 113: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Lexing Ambiguities

T_For forT_Identifier [A-Za-z_][A-Za-z0-9_]*

f o tr

f o tr

Page 114: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Implementing Maximal Munch

● Given a set of regular expressions, how can we use them to implement maximum munch?

● Idea:● Convert expressions to NFAs.● Run all NFAs in parallel, keeping track of the last

match.● When all automata get stuck, report the last match

and restart the search at that point.

Page 115: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

Page 116: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

Page 117: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 118: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 119: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 120: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 121: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 122: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 123: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 124: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 125: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 126: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 127: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 128: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 129: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 130: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 131: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 132: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 133: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 134: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 135: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 136: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 137: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 138: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 139: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 140: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 141: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 142: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 143: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 144: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 145: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 146: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 147: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 148: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 149: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 150: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 151: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 152: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 153: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 154: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 155: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 156: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 157: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 158: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 159: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 160: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 161: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Page 162: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A Minor Simplification

Page 163: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A Minor Simplification

start d o

start d o u b l e

start Σ

Page 164: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A Minor Simplification

start d o

start d o u b l e

start Σ

Page 165: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A Minor Simplification

d o

d o u b l e

Σ

ε

ε

εstart

Page 166: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Other Conflicts

T_Do doT_Double doubleT_Identifier [A-Za-z_][A-Za-z0-9_]*

Page 167: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Other Conflicts

T_Do doT_Double doubleT_Identifier [A-Za-z_][A-Za-z0-9_]*

d o bu el

Page 168: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Other Conflicts

T_Do doT_Double doubleT_Identifier [A-Za-z_][A-Za-z0-9_]*

d o bu el

d o bu eld o bu el

Page 169: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

More Tiebreaking

● When two regular expressions apply, choose the one with the greater “priority.”

● Simple priority system: which rule was defined first?

Page 170: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Other Conflicts

T_Do doT_Double doubleT_Identifier [A-Za-z_][A-Za-z0-9_]*

d o bu el

d o bu eld o bu el

Page 171: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Other Conflicts

T_Do doT_Double doubleT_Identifier [A-Za-z_][A-Za-z0-9_]*

d o bu el

d o bu el

Page 172: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Other Conflicts

T_Do doT_Double doubleT_Identifier [A-Za-z_][A-Za-z0-9_]*

d o bu el

d o bu elWhy isn't this a

problem?

Page 173: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

One Last Detail...

● We know what to do if multiple rules match.● What if nothing matches?● Trick: Add a “catch-all” rule that matches any

character and reports an error.

Page 174: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Summary of Conflict Resolution

● Construct an automaton for each regular expression.

● Merge them into one automaton by adding a new start state.

● Scan the input, keeping track of the last known match.

● Break ties by choosing higher-precedence matches.

● Have a catch-all rule to handle errors.

Page 175: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Speeding up the Scanner

Page 176: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

DFAs

● The automata we've seen so far have all been NFAs.

● A DFA is like an NFA, but with tighter restrictions:● Every state must have exactly one transition

defined for every letter.● ε-moves are not allowed.

Page 177: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A Sample DFA

Page 178: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A Sample DFA

start

0 0

1

0

1

0

1

1

Page 179: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A Sample DFA

D

start

0 0

1

0

1

0

1

A

C

B1

Page 180: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

A

A

B

B

C

C

D

D

A Sample DFA

D

start

0 0

1

0

1

0

1

A

C

B1

A

B

C

D

0 1

Page 181: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Simulating a DFA

● Begin in the start state.● For each character of the input:

● Follow the indicated transition to the next state.

● Assuming transitions are in a table, time to scan a string of length S is O(|S|).

Page 182: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Speeding up Matching

● In the worst-case, an NFA takes O(|S||Q|2) time to match a string.

● DFAs, on the other hand, take only O(|S|).● There is another (beautiful!) algorithm to

convert NFAs to DFAs.

Lexical Specification

RegularExpressions NFA DFA

Table-DrivenDFA

Page 183: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Subset Construction

● NFAs can be in many states at once, while DFAs can only be in a single state at a time.

● Key idea: Make the DFA simulate the NFA.● Have the states of the DFA correspond to the

sets of states of the NFA.● Transitions between states of DFA correspond

to transitions between sets of states in the NFA.

Page 184: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Formally...

● Given an NFA N = (Σ, Q, δ, s, F), define the DFA D = (Σ', Q', δ', s', F') as follows:● Σ' = Σ● Q' = 2Q

● δ'(S, a) = {ECLOSE(s') | s ∈ S and δ(s, a) = s'}● s' = ECLOSE(s)● F' = { S | s ∃ ∈ S. s ∈ F }

● Here, ECLOSE(s) is the set of states reachable from s via ε-moves.

Page 185: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

From NFA to DFA

Page 186: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

From NFA to DFA

1 2 3d o

4 5 6d o

7u

8b

9l

10e

11 12Σ

0

ε

ε

εstart

Page 187: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

From NFA to DFA

1 2 3d o

4 5 6d o

7u

8b

9l

10e

11 12Σ

0

ε

ε

εstart

Page 188: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

From NFA to DFA

1 2 3d o

4 5 6d o

7u

8b

9l

10e

11 12Σ

0

ε

ε

εstart

0, 1, 4, 11start

Page 189: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

From NFA to DFA

1 2 3d o

4 5 6d o

7u

8b

9l

10e

11 12Σ

0

ε

ε

εstart

0, 1, 4, 11start

Page 190: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

From NFA to DFA

1 2 3d o

4 5 6d o

7u

8b

9l

10e

11 12Σ

0

ε

ε

εstart

0, 1, 4, 11start

2, 5, 122, 5, 12d

Page 191: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

From NFA to DFA

1 2 3d o

4 5 6d o

7u

8b

9l

10e

11 12Σ

0

ε

ε

εstart

0, 1, 4, 11start

2, 5, 122, 5, 12d

Page 192: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

From NFA to DFA

1 2 3d o

4 5 6d o

7u

8b

9l

10e

11 12Σ

0

ε

ε

εstart

0, 1, 4, 11start

2, 5, 122, 5, 12d

Page 193: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

From NFA to DFA

1 2 3d o

4 5 6d o

7u

8b

9l

10e

11 12Σ

0

ε

ε

εstart

0, 1, 4, 11start

2, 5, 122, 5, 12d

Page 194: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

From NFA to DFA

1 2 3d o

4 5 6d o

7u

8b

9l

10e

11 12Σ

0

ε

ε

εstart

0, 1, 4, 11start

2, 5, 122, 5, 12d

12

Σ – d

12

Page 195: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

From NFA to DFA

1 2 3d o

4 5 6d o

7u

8b

9l

10e

11 12Σ

0

ε

ε

εstart

0, 1, 4, 11start

2, 5, 122, 5, 12d

12

Σ – d

12

Page 196: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

From NFA to DFA

1 2 3d o

4 5 6d o

7u

8b

9l

10e

11 12Σ

0

ε

ε

εstart

0, 1, 4, 11start

2, 5, 122, 5, 12d

12

Σ – d

12

Page 197: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

From NFA to DFA

1 2 3d o

4 5 6d o

7u

8b

9l

10e

11 12Σ

0

ε

ε

εstart

0, 1, 4, 11start

2, 5, 122, 5, 12d

12

Σ – d

12

Page 198: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

From NFA to DFA

1 2 3d o

4 5 6d o

7u

8b

9l

10e

11 12Σ

0

ε

ε

εstart

0, 1, 4, 11start

2, 5, 122, 5, 12d

12

Σ – d

3, 6o

3, 6

12

Page 199: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

From NFA to DFA

1 2 3d o

4 5 6d o

7u

8b

9l

10e

11 12Σ

0

ε

ε

εstart

0, 1, 4, 11start

2, 5, 122, 5, 12d

12

Σ – d

3, 6o

3, 6

12

Page 200: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

From NFA to DFA

1 2 3d o

4 5 6d o

7u

8b

9l

10e

11 12Σ

0

ε

ε

εstart

0, 1, 4, 11start

2, 5, 122, 5, 12d

12

Σ – d

3, 6o

3, 6 7 8 9 1010u b l e

12

Page 201: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

From NFA to DFA

1 2 3d o

4 5 6d o

7u

8b

9l

10e

11 12Σ

0

ε

ε

εstart

0, 1, 4, 11start

2, 5, 122, 5, 12d

12

Σ – d

3, 6o

3, 6 7 8 9 1010u b l e

Σ

Σ – o Σ – u Σ – b Σ – l Σ – e

Σ

Σ

12

Page 202: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

From NFA to DFA

1 2 3d o

4 5 6d o

7u

8b

9l

10e

11 12Σ

0

ε

ε

εstart

0, 1, 4, 11start

2, 5, 122, 5, 12d

12

Σ – d

3, 6o

3, 6 7 8 9 1010u b l e

Σ

Σ – o Σ – u Σ – b Σ – l Σ – e

Σ

Σ

12

Page 203: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Modified Subset Construction

● Instead of marking whether a state is accepting, remember which token type it matches.

● Break ties with priorities.● When using DFA as a scanner, consider the

DFA “stuck” if it enters the state corresponding to the empty set.

Page 204: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Performance Concerns

● The NFA-to-DFA construction can introduce exponentially many states.

● Time/memory tradeoff:● Low-memory NFA has higher scan time.● High-memory DFA has lower scan time.

● Could use a hybrid approach by simplifying NFA before generating code.

Page 205: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Real-World Scanning: Python

Page 206: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

w h i l e ( i < z ) \n \t + i p ;

while (ip < z) ++ip;

p + +

T_While ( T_Ident < T_Ident ) ++ T_Ident

ip z ip

While

++

Ident

<

Ident Ident

ip z ip

Page 207: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Python Blocks

● Scoping handled by whitespace:if w == z:

a = b

c = d

else:

e = f

g = h

● What does that mean for the scanner?

Page 208: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Whitespace Tokens

● Special tokens inserted to indicate changes in levels of indentation.

● NEWLINE marks the end of a line.● INDENT indicates an increase in indentation.● DEDENT indicates a decrease in indentation.● Note that INDENT and DEDENT encode

change in indentation, not the total amount of indentation.

Page 209: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

if w == z: a = b c = delse: e = fg = h

Scanning Python

Page 210: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

if w == z: a = b c = delse: e = fg = h

if ident

w

== ident

z

: NEWLINE

INDENT ident

a

= ident

b

NEWLINE

ident

c

= ident

d

NEWLINE

DEDENT else :

INDENT ident

e

= ident

f

NEWLINE

DEDENT ident

g

= ident

h

NEWLINE

Scanning Python

NEWLINE

Page 211: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

if w == z: { a = b; c = d;} else { e = f;}g = h;

if ident

w

== ident

z

: NEWLINE

INDENT ident

a

= ident

b

NEWLINE

ident

c

= ident

d

NEWLINE

DEDENT else :

INDENT ident

e

= ident

f

NEWLINE

DEDENT ident

g

= ident

h

NEWLINE

Scanning Python

NEWLINE

Page 212: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

if w == z: { a = b; c = d;} else { e = f;}g = h;

if ident

w

== ident

z

:

{ ident

a

= ident

b

;

ident

c

= ident

d

;

} else :

ident

e

= ident

f

;

} ident

g

= ident

h

;

Scanning Python

{

Page 213: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Where to INDENT/DEDENT?

● Scanner maintains a stack of line indentations keeping track of all indented contexts so far.

● Initially, this stack contains 0, since initially the contents of the file aren't indented.

● On a newline:● See how much whitespace is at the start of the line.● If this value exceeds the top of the stack:

– Push the value onto the stack.

– Emit an INDENT token.

● Otherwise, while the value is less than the top of the stack:– Pop the stack.

– Emit a DEDENT token.

Source: http://docs.python.org/reference/lexical_analysis.html

Page 214: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Interesting Observation

● Normally, more text on a line translates into more tokens.

● With DEDENT, less text on a line often means more tokens:

if cond1: if cond2: if cond3: if cond4: if cond5: statement1statement2

Page 215: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Summary

● Lexical analysis splits input text into tokens holding a lexeme and an attribute.

● Lexemes are sets of strings often defined with regular expressions.

● Regular expressions can be converted to NFAs and from there to DFAs.

● Maximal-munch using an automaton allows for fast scanning.

● Not all tokens come directly from the source code.

Page 216: Lexical Analysis - KeithSchwarz.comWhy do Lexical Analysis? Dramatically simplify parsing. Eliminate whitespace. Eliminate comments. Convert data early. Separate out logic to read

Next Time

Lexical Analysis

Syntax Analysis

Semantic Analysis

IR Generation

IR Optimization

Code Generation

Optimization

SourceCode

MachineCode

(Plus a little bit here)