lexical analysis and lexical analyzer generatorsengelen/courses/cop562105/ch3.pdf · 2005. 2....
TRANSCRIPT
![Page 1: Lexical Analysis and Lexical Analyzer Generatorsengelen/courses/COP562105/Ch3.pdf · 2005. 2. 7. · 3 5 Tokens, Patterns, and Lexemes •A token is a classification of lexical units](https://reader033.vdocuments.us/reader033/viewer/2022060820/6099bec03cccb9660a0294ca/html5/thumbnails/1.jpg)
1
1
Lexical Analysis andLexical Analyzer Generators
Chapter 3
COP5621 Compiler ConstructionCopyright Robert van Engelen, Florida State University, 2005
2
The Reason Why LexicalAnalysis is a Separate Phase
• Simplifies the design of the compiler– LL(1) or LR(1) with 1 lookahead would not be possible
• Provides efficient implementation– Systematic techniques to implement lexical analyzers
by hand or automatically– Stream buffering methods to scan input
• Improves portability– Non-standard symbols and alternate character
encodings can be more easily translated
![Page 2: Lexical Analysis and Lexical Analyzer Generatorsengelen/courses/COP562105/Ch3.pdf · 2005. 2. 7. · 3 5 Tokens, Patterns, and Lexemes •A token is a classification of lexical units](https://reader033.vdocuments.us/reader033/viewer/2022060820/6099bec03cccb9660a0294ca/html5/thumbnails/2.jpg)
2
3
Interaction of the LexicalAnalyzer with the Parser
LexicalAnalyzer Parser
SourceProgram
Token,tokenval
Symbol Table
Get nexttoken
error error
4
Attributes of Tokens
Lexical analyzer
<id, “y”> <assign, > <num, 31> <+, > <num, 28> <*, > <id, “x”>
y := 31 + 28*x
Parser
token
tokenval(token attribute)
![Page 3: Lexical Analysis and Lexical Analyzer Generatorsengelen/courses/COP562105/Ch3.pdf · 2005. 2. 7. · 3 5 Tokens, Patterns, and Lexemes •A token is a classification of lexical units](https://reader033.vdocuments.us/reader033/viewer/2022060820/6099bec03cccb9660a0294ca/html5/thumbnails/3.jpg)
3
5
Tokens, Patterns, and Lexemes
• A token is a classification of lexical units– For example: id and num
• Lexemes are the specific character strings thatmake up a token– For example: abc and 123
• Patterns are rules describing the set of lexemesbelonging to a token– For example: “letter followed by letters and digits” and
“non-empty sequence of digits”
6
Specification of Patterns forTokens: Terminology
• An alphabet Σ is a finite set of symbols(characters)
• A string s is a finite sequence of symbolsfrom Σ– |s| denotes the length of string s– ε denotes the empty string, thus |ε| = 0
• A language is a specific set of strings oversome fixed alphabet Σ
![Page 4: Lexical Analysis and Lexical Analyzer Generatorsengelen/courses/COP562105/Ch3.pdf · 2005. 2. 7. · 3 5 Tokens, Patterns, and Lexemes •A token is a classification of lexical units](https://reader033.vdocuments.us/reader033/viewer/2022060820/6099bec03cccb9660a0294ca/html5/thumbnails/4.jpg)
4
7
Specification of Patterns forTokens: String Operations
• The concatenation of two strings x and y isdenoted by xy
• The exponentation of a string s is definedby
s0 = εsi = si-1s for i > 0
(note that sε = εs = s)
8
Specification of Patterns forTokens: Language Operations
• UnionL ∪ M = {s | s ∈ L or s ∈ M}
• ConcatenationLM = {xy | x ∈ L and y ∈ M}
• ExponentiationL0 = {ε}; Li = Li-1L
• Kleene closureL* = ∪i=0,…,∞ Li
• Positive closureL+ = ∪i=1,…,∞ Li
![Page 5: Lexical Analysis and Lexical Analyzer Generatorsengelen/courses/COP562105/Ch3.pdf · 2005. 2. 7. · 3 5 Tokens, Patterns, and Lexemes •A token is a classification of lexical units](https://reader033.vdocuments.us/reader033/viewer/2022060820/6099bec03cccb9660a0294ca/html5/thumbnails/5.jpg)
5
9
Specification of Patterns forTokens: Regular Expressions
• Basis symbols:– ε is a regular expression denoting language {ε}– a ∈ Σ is a regular expression denoting {a}
• If r and s are regular expressions denotinglanguages L(r) and M(s) respectively, then– r | s is a regular expression denoting L(r) ∪ M(s)– rs is a regular expression denoting L(r)M(s)– r* is a regular expression denoting L(r)*
– (r) is a regular expression denoting L(r)• A language defined by a regular expression is
called a regular set
10
Specification of Patterns forTokens: Regular Definitions
• Naming convention for regular expressions:d1 → r1d2 → r2…dn → rn
where ri is a regular expression overΣ ∪ {d1, d2, …, di-1 }
• Each dj in ri is textually substituted in ri
![Page 6: Lexical Analysis and Lexical Analyzer Generatorsengelen/courses/COP562105/Ch3.pdf · 2005. 2. 7. · 3 5 Tokens, Patterns, and Lexemes •A token is a classification of lexical units](https://reader033.vdocuments.us/reader033/viewer/2022060820/6099bec03cccb9660a0294ca/html5/thumbnails/6.jpg)
6
11
Specification of Patterns forTokens: Regular Definitions
• Example:
letter → A | B | … | Z | a | b | … | z digit → 0 | 1 | … | 9 id → letter ( letter | digit )*
• Cannot use recursion, this is illegal:
digits → digit digits | digit
12
Specification of Patterns forTokens: Notational Shorthands
• We frequently use the following shorthands:r+ = rr*
r? = r | ε[a-z] = a | b | c | … | z
• For example:digit → [0-9]num → digit+ (. digit+)? ( E (+|-)? digit+ )?
![Page 7: Lexical Analysis and Lexical Analyzer Generatorsengelen/courses/COP562105/Ch3.pdf · 2005. 2. 7. · 3 5 Tokens, Patterns, and Lexemes •A token is a classification of lexical units](https://reader033.vdocuments.us/reader033/viewer/2022060820/6099bec03cccb9660a0294ca/html5/thumbnails/7.jpg)
7
13
Regular Definitions andGrammars
stmt → if expr then stmt | if expr then stmt else stmt | ε expr → term relop term | termterm → id | num if → if
then → then else → elserelop → < | <= | <> | > | >= | = id → letter ( letter | digit )*
num → digit+ (. digit+)? ( E (+|-)? digit+ )?
Grammar
Regular definitions
14
Implementing a Scanner UsingTransition Diagrams
0 21
6
3
4
5
7
8
return(relop, LE)
return(relop, NE)
return(relop, LT)
return(relop, EQ)
return(relop, GE)
return(relop, GT)
start <
=
>
=
>
=
other
other
*
*
9start letter 10 11*other
letter or digit
return(gettoken(), install_id())
relop → < | <= | <> | > | >= | =
id → letter ( letter | digit )*
![Page 8: Lexical Analysis and Lexical Analyzer Generatorsengelen/courses/COP562105/Ch3.pdf · 2005. 2. 7. · 3 5 Tokens, Patterns, and Lexemes •A token is a classification of lexical units](https://reader033.vdocuments.us/reader033/viewer/2022060820/6099bec03cccb9660a0294ca/html5/thumbnails/8.jpg)
8
15Implementing a Scanner UsingTransition Diagrams (Code)
token nexttoken(){ while (1) { switch (state) { case 0: c = nextchar(); if (c==blank || c==tab || c==newline) { state = 0; lexeme_beginning++; } else if (c==‘<‘) state = 1; else if (c==‘=‘) state = 5; else if (c==‘>’) state = 6; else state = fail(); break; case 1: … case 9: c = nextchar(); if (isletter(c)) state = 10; else state = fail(); break; case 10: c = nextchar(); if (isletter(c)) state = 10; else if (isdigit(c)) state = 10; else state = 11; break; …
int fail(){ forward = token_beginning; swith (start) { case 0: start = 9; break; case 9: start = 12; break; case 12: start = 20; break; case 20: start = 25; break; case 25: recover(); break; default: /* error */ } return start;}
Decides whatother start state
is applicable
16
The Lex and Flex ScannerGenerators
• Lex and its newer cousin flex are scannergenerators
• Systematically translate regular definitionsinto C source code for efficient scanning
• Generated code is easy to integrate in Capplications
![Page 9: Lexical Analysis and Lexical Analyzer Generatorsengelen/courses/COP562105/Ch3.pdf · 2005. 2. 7. · 3 5 Tokens, Patterns, and Lexemes •A token is a classification of lexical units](https://reader033.vdocuments.us/reader033/viewer/2022060820/6099bec03cccb9660a0294ca/html5/thumbnails/9.jpg)
9
17
Creating a Lexical Analyzer withLex and Flex
lex or flexcompiler
lexsource
programlex.l
lex.yy.c
inputstream
Ccompiler
a.outsequenceof tokens
lex.yy.c
a.out
18
Lex Specification• A lex specification consists of three parts:
regular definitions, C declarations in %{ %}%%translation rules%%user-defined auxiliary procedures
• The translation rules are of the form:p1 { action1 }p2 { action2 }…pn { actionn }
![Page 10: Lexical Analysis and Lexical Analyzer Generatorsengelen/courses/COP562105/Ch3.pdf · 2005. 2. 7. · 3 5 Tokens, Patterns, and Lexemes •A token is a classification of lexical units](https://reader033.vdocuments.us/reader033/viewer/2022060820/6099bec03cccb9660a0294ca/html5/thumbnails/10.jpg)
10
19
Regular Expressions in Lexx match the character x \. match the character . “string”match contents of string of characters . match any character except newline^ match beginning of a line$ match the end of a line[xyz] match one character x, y, or z (use \ to escape -) [^xyz]match any character except x, y, and z [a-z] match one of a to zr* closure (match zero or more occurrences)r+ positive closure (match one or more occurrences)r? optional (match zero or one occurrence)r1r2 match r1 then r2 (concatenation)r1|r2 match r1 or r2 (union)( r ) groupingr1\r2 match r1 when followed by r2{d} match the regular expression defined by d
20
Example Lex Specification 1
%{#include <stdio.h>%}%%[0-9]+ { printf(“%s\n”, yytext); }.|\n { }%%main(){ yylex();}
Containsthe matching
lexeme
Invokesthe lexicalanalyzer
lex spec.lgcc lex.yy.c -ll./a.out < spec.l
Translationrules
![Page 11: Lexical Analysis and Lexical Analyzer Generatorsengelen/courses/COP562105/Ch3.pdf · 2005. 2. 7. · 3 5 Tokens, Patterns, and Lexemes •A token is a classification of lexical units](https://reader033.vdocuments.us/reader033/viewer/2022060820/6099bec03cccb9660a0294ca/html5/thumbnails/11.jpg)
11
21
Example Lex Specification 2%{#include <stdio.h>int ch = 0, wd = 0, nl = 0;%}delim [ \t]+%%\n { ch++; wd++; nl++; }^{delim} { ch+=yyleng; }{delim} { ch+=yyleng; wd++; }. { ch++; }%%main(){ yylex(); printf("%8d%8d%8d\n", nl, wd, ch);}
RegulardefinitionTranslation
rules
22
Example Lex Specification 3%{#include <stdio.h>%}digit [0-9]letter [A-Za-z]id {letter}({letter}|{digit})*%%{digit}+ { printf(“number: %s\n”, yytext); }{id} { printf(“ident: %s\n”, yytext); }. { printf(“other: %s\n”, yytext); }%%main(){ yylex(); }
RegulardefinitionsTranslation
rules
![Page 12: Lexical Analysis and Lexical Analyzer Generatorsengelen/courses/COP562105/Ch3.pdf · 2005. 2. 7. · 3 5 Tokens, Patterns, and Lexemes •A token is a classification of lexical units](https://reader033.vdocuments.us/reader033/viewer/2022060820/6099bec03cccb9660a0294ca/html5/thumbnails/12.jpg)
12
23
Example Lex Specification 4%{ /* definitions of manifest constants */#define LT (256)…%}delim [ \t\n]ws {delim}+letter [A-Za-z]digit [0-9]id {letter}({letter}|{digit})*number {digit}+(\.{digit}+)?(E[+\-]?{digit}+)?%%{ws} { }if {return IF;}then {return THEN;}else {return ELSE;}{id} {yylval = install_id(); return ID;}{number} {yylval = install_num(); return NUMBER;}“<“ {yylval = LT; return RELOP;}“<=“ {yylval = LE; return RELOP;}“=“ {yylval = EQ; return RELOP;}“<>“ {yylval = NE; return RELOP;}“>“ {yylval = GT; return RELOP;}“>=“ {yylval = GE; return RELOP;}%%int install_id()…
Returntoken toparser
Tokenattribute
Install yytext asidentifier in symbol table
24
Design of a Lexical AnalyzerGenerator
• Translate regular expressions to NFA• Translate NFA to an efficient DFA
regularexpressions NFA DFA
Simulate NFAto recognize
tokens
Simulate DFAto recognize
tokens
Optional
![Page 13: Lexical Analysis and Lexical Analyzer Generatorsengelen/courses/COP562105/Ch3.pdf · 2005. 2. 7. · 3 5 Tokens, Patterns, and Lexemes •A token is a classification of lexical units](https://reader033.vdocuments.us/reader033/viewer/2022060820/6099bec03cccb9660a0294ca/html5/thumbnails/13.jpg)
13
25
Nondeterministic FiniteAutomata
• Definition: an NFA is a 5-tuple (S,Σ,δ,s0,F)where
S is a finite set of statesΣ is a finite set of input symbol alphabetδ is a mapping from S×Σ to a set of statess0 ∈ S is the start stateF ⊆ S is the set of accepting (or final) states
26
Transition Graph
• An NFA can be diagrammaticallyrepresented by a labeled directed graphcalled a transition graph
0start a 1 32b b
a
b
S = {0,1,2,3}Σ = {a,b}s0 = 0F = {3}
![Page 14: Lexical Analysis and Lexical Analyzer Generatorsengelen/courses/COP562105/Ch3.pdf · 2005. 2. 7. · 3 5 Tokens, Patterns, and Lexemes •A token is a classification of lexical units](https://reader033.vdocuments.us/reader033/viewer/2022060820/6099bec03cccb9660a0294ca/html5/thumbnails/14.jpg)
14
27
Transition Table
• The mapping δ of an NFA can berepresented in a transition table
{3}2{2}1{0}{0, 1}0
Inputb
InputaStateδ(0,a) = {0,1}
δ(0,b) = {0}δ(1,b) = {2}δ(2,b) = {3}
28
The Language Defined by anNFA
• An NFA accepts an input string x iff there is somepath with edges labeled with symbols from x insequence from the start state to some acceptingstate in the transition graph
• A state transition from one state to another on thepath is called a move
• The language defined by an NFA is the set ofinput strings it accepts, such as (a|b)*abb for theexample NFA
![Page 15: Lexical Analysis and Lexical Analyzer Generatorsengelen/courses/COP562105/Ch3.pdf · 2005. 2. 7. · 3 5 Tokens, Patterns, and Lexemes •A token is a classification of lexical units](https://reader033.vdocuments.us/reader033/viewer/2022060820/6099bec03cccb9660a0294ca/html5/thumbnails/15.jpg)
15
29
Design of a Lexical AnalyzerGenerator: RE to NFA to DFA
s0
N(p1)
N(p2)startε
εN(pn)
ε…
p1 { action1 }p2 { action2 }…pn { actionn }
action1
action2
actionn
Lex specification withregular expressions
NFA
DFA
Subset construction(optional)
30
N(r2)N(r1)
From Regular Expression to NFA(Thompson’s Construction)
fi ε
fai
fiN(r1)
N(r2)
start
start
start ε
ε ε
ε
fistart
N(r) fistart
ε
ε
ε
a
r1 | r2
r1r2
r* ε ε
![Page 16: Lexical Analysis and Lexical Analyzer Generatorsengelen/courses/COP562105/Ch3.pdf · 2005. 2. 7. · 3 5 Tokens, Patterns, and Lexemes •A token is a classification of lexical units](https://reader033.vdocuments.us/reader033/viewer/2022060820/6099bec03cccb9660a0294ca/html5/thumbnails/16.jpg)
16
31
Combining the NFAs of a Set ofRegular Expressions
2a1start
6a3start
4 5b b
8b7start
a b
a { action1 }abb { action2 } a*b+ { action3 }
2a1
6a3 4 5b b
8b7
a b0
start
ε
εε
32
Simulating the Combined NFAExample 1
2a1
6a3 4 5b b
8b7
a b0
start
ε
εε
7310
742 7 8
Must find the longest match:Continue until no further moves are possibleWhen last state is accepting: execute action
action1
action2
action3
a ba a noneaction3
![Page 17: Lexical Analysis and Lexical Analyzer Generatorsengelen/courses/COP562105/Ch3.pdf · 2005. 2. 7. · 3 5 Tokens, Patterns, and Lexemes •A token is a classification of lexical units](https://reader033.vdocuments.us/reader033/viewer/2022060820/6099bec03cccb9660a0294ca/html5/thumbnails/17.jpg)
17
33
Simulating the Combined NFAExample 2
2a1
6a3 4 5b b
8b7
a b0
start
ε
εε
7310
742
85
86
When two or more accepting states are reached, thefirst action given in the Lex specification is executed
action1
action2
action3
a bb a noneaction2action3
34
Deterministic Finite Automata
• A deterministic finite automaton is a special caseof an NFA– No state has an ε-transition– For each state s and input symbol a there is at most one
edge labeled a leaving s• Each entry in the transition table is a single state
– At most one path exists to accept a string– Simulation algorithm is simple
![Page 18: Lexical Analysis and Lexical Analyzer Generatorsengelen/courses/COP562105/Ch3.pdf · 2005. 2. 7. · 3 5 Tokens, Patterns, and Lexemes •A token is a classification of lexical units](https://reader033.vdocuments.us/reader033/viewer/2022060820/6099bec03cccb9660a0294ca/html5/thumbnails/18.jpg)
18
35
Example DFA
0start a 1 32b b
bb
a
a
a
A DFA that accepts (a|b)*abb
36
Conversion of an NFA into aDFA
• The subset construction algorithm converts anNFA into a DFA using:
ε-closure(s) = {s} ∪ {t | s →ε … →ε t}ε-closure(T) = ∪s∈T ε-closure(s)move(T,a) = {t | s →a t and s ∈ T}
• The algorithm produces:Dstates is the set of states of the new DFAconsisting of sets of states of the NFADtran is the transition table of the new DFA
![Page 19: Lexical Analysis and Lexical Analyzer Generatorsengelen/courses/COP562105/Ch3.pdf · 2005. 2. 7. · 3 5 Tokens, Patterns, and Lexemes •A token is a classification of lexical units](https://reader033.vdocuments.us/reader033/viewer/2022060820/6099bec03cccb9660a0294ca/html5/thumbnails/19.jpg)
19
37
ε-closure and move Examples2a1
6a3 4 5b b
8b7
a b0
start
ε
εε
ε-closure({0}) = {0,1,3,7}move({0,1,3,7},a) = {2,4,7}ε-closure({2,4,7}) = {2,4,7}move({2,4,7},a) = {7}ε-closure({7}) = {7}move({7},b) = {8}ε-closure({8}) = {8}move({8},a) = ∅
7310
742 7 8
a ba a none
Also used to simulate NFAs
38
Simulating an NFA usingε-closure and move
S := ε-closure({s0})Sprev := ∅ a := nextchar()while S ≠ ∅ do
Sprev := SS := ε-closure(move(S,a))a := nextchar()
end doif Sprev ∩ F ≠ ∅ then
execute action in Sprevreturn “yes”
else return “no”
![Page 20: Lexical Analysis and Lexical Analyzer Generatorsengelen/courses/COP562105/Ch3.pdf · 2005. 2. 7. · 3 5 Tokens, Patterns, and Lexemes •A token is a classification of lexical units](https://reader033.vdocuments.us/reader033/viewer/2022060820/6099bec03cccb9660a0294ca/html5/thumbnails/20.jpg)
20
39
The Subset ConstructionAlgorithm
Initially, ε-closure(s0) is the only state in Dstates and it is unmarkedwhile there is an unmarked state T in Dstates do
mark Tfor each input symbol a ∈ Σ do
U := ε-closure(move(T,a))if U is not in Dstates then
add U as an unmarked state to Dstatesend ifDtran[T,a] := U
end doend do
40Subset Construction Example 1
0start a1 10
2
b
b
a
b
3
4 5
6 7 8 9ε ε
εε
εε
ε
ε
Astart
B
C
D E
b
b
b
b
b
aa
a
a
DstatesA = {0,1,2,4,7}B = {1,2,3,4,6,7,8}C = {1,2,4,5,6,7}D = {1,2,4,5,6,7,9}E = {1,2,4,5,6,7,10}
a
![Page 21: Lexical Analysis and Lexical Analyzer Generatorsengelen/courses/COP562105/Ch3.pdf · 2005. 2. 7. · 3 5 Tokens, Patterns, and Lexemes •A token is a classification of lexical units](https://reader033.vdocuments.us/reader033/viewer/2022060820/6099bec03cccb9660a0294ca/html5/thumbnails/21.jpg)
21
41Subset Construction Example 2
2a1
6a3 4 5b b
8b7
a b0
start
ε
εε
a1
a2
a3
DstatesA = {0,1,3,7}B = {2,4,7}C = {8}D = {7}E = {5,8}F = {6,8}
Astart
a
D
b
b
b
ab
bB
C
E Fa
b
a1
a3
a3 a2 a3
42
Minimizing the Number of Statesof a DFA
Astart
B
C
D E
b
b
b
b
b
aa
a
a
a
Astart
B D Eb b
a
ab
a
a
![Page 22: Lexical Analysis and Lexical Analyzer Generatorsengelen/courses/COP562105/Ch3.pdf · 2005. 2. 7. · 3 5 Tokens, Patterns, and Lexemes •A token is a classification of lexical units](https://reader033.vdocuments.us/reader033/viewer/2022060820/6099bec03cccb9660a0294ca/html5/thumbnails/22.jpg)
22
43
From Regular Expression to DFADirectly
• The important states of an NFA are thosewithout an ε-transition, that is ifmove({s},a) ≠ ∅ for some a then s is animportant state
• The subset construction algorithm uses onlythe important states when it determinesε-closure(move(T,a))
44
From Regular Expression to DFADirectly (Algorithm)
• Augment the regular expression r with aspecial end symbol # to make acceptingstates important: the new expression is r#
• Construct a syntax tree for r#• Traverse the tree to construct functions
nullable, firstpos, lastpos, and followpos
![Page 23: Lexical Analysis and Lexical Analyzer Generatorsengelen/courses/COP562105/Ch3.pdf · 2005. 2. 7. · 3 5 Tokens, Patterns, and Lexemes •A token is a classification of lexical units](https://reader033.vdocuments.us/reader033/viewer/2022060820/6099bec03cccb9660a0294ca/html5/thumbnails/23.jpg)
23
45
From Regular Expression to DFADirectly: Syntax Tree of (a|b)*abb#
*
|
1a
2b
3a
4b
5b
#6
concatenation
closure
alternationpositionnumber
(for leafs ≠ε)
46
From Regular Expression to DFADirectly: Annotating the Tree
• nullable(n): the subtree at node n generateslanguages including the empty string
• firstpos(n): set of positions that can match the firstsymbol of a string generated by the subtree atnode n
• lastpos(n): the set of positions that can match thelast symbol of a string generated be the subtree atnode n
• followpos(i): the set of positions that can followposition i in the tree
![Page 24: Lexical Analysis and Lexical Analyzer Generatorsengelen/courses/COP562105/Ch3.pdf · 2005. 2. 7. · 3 5 Tokens, Patterns, and Lexemes •A token is a classification of lexical units](https://reader033.vdocuments.us/reader033/viewer/2022060820/6099bec03cccb9660a0294ca/html5/thumbnails/24.jpg)
24
47From Regular Expression to DFADirectly: Annotating the Tree
lastpos(c1)firstpos(c1)true*|c1
if nullable(c2) thenlastpos(c1) ∪lastpos(c2)
else lastpos(c2)
if nullable(c1) thenfirstpos(c1) ∪firstpos(c2)
else firstpos(c1)
nullable(c1)and
nullable(c2)
•/ \
c1 c2
lastpos(c1)∪
lastpos(c2)
firstpos(c1)∪
firstpos(c2)
nullable(c1)or
nullable(c2)
|/ \
c1 c2
{i}{i}falseLeaf i
∅∅trueLeaf ε
lastpos(n)firstpos(n)nullable(n)Node n
48
From Regular Expression to DFADirectly: Syntax Tree of (a|b)*abb#
{6}{1, 2, 3}
{5}{1, 2, 3}
{4}{1, 2, 3}
{3}{1, 2, 3}
{1, 2}{1, 2} *
{1, 2}{1, 2} |
{1}{1} a {2}{2} b
{3}{3} a
{4}{4} b
{5}{5} b
{6}{6} #
nullable
firstpos lastpos
1 2
3
4
5
6
![Page 25: Lexical Analysis and Lexical Analyzer Generatorsengelen/courses/COP562105/Ch3.pdf · 2005. 2. 7. · 3 5 Tokens, Patterns, and Lexemes •A token is a classification of lexical units](https://reader033.vdocuments.us/reader033/viewer/2022060820/6099bec03cccb9660a0294ca/html5/thumbnails/25.jpg)
25
49
From Regular Expression to DFADirectly: followpos
for each node n in the tree doif n is a cat-node with left child c1 and right child c2 then
for each i in lastpos(c1) dofollowpos(i) := followpos(i) ∪ firstpos(c2)
end doelse if n is a star-node
for each i in lastpos(n) dofollowpos(i) := followpos(i) ∪ firstpos(n)
end doend if
end do
50From Regular Expression to DFADirectly: Algorithm
s0 := firstpos(root) where root is the root of the syntax treeDstates := {s0} and is unmarkedwhile there is an unmarked state T in Dstates do
mark Tfor each input symbol a ∈ Σ do
let U be the set of positions that are in followpos(p)for some position p in T,such that the symbol at position p is a
if U is not empty and not in Dstates thenadd U as an unmarked state to Dstates
end ifDtran[T,a] := U
end doend do
![Page 26: Lexical Analysis and Lexical Analyzer Generatorsengelen/courses/COP562105/Ch3.pdf · 2005. 2. 7. · 3 5 Tokens, Patterns, and Lexemes •A token is a classification of lexical units](https://reader033.vdocuments.us/reader033/viewer/2022060820/6099bec03cccb9660a0294ca/html5/thumbnails/26.jpg)
26
51
From Regular Expression to DFADirectly: Example
1,2,3start a 1,2,3,4
1,2,3,6
1,2,3,5
b b
b b
a
a
a
-6{6}5{5}4{4}3
{1, 2, 3}2{1, 2, 3}1followposNode
1
2
3 4 5 6
52
Time-Space Tradeoffs
O(|x|)O(2|r|)DFA
O(|r|×|x|)O(|r|)NFA
Time(worst case)
Space(worst case)Automaton