lexical analysis
DESCRIPTION
Lexical Analysis. Why separate lexical and syntax analyses? simpler design efficiency portability. Tokens, Patterns, Lexemes. Tokens Terminal symbols in the grammar Patterns Description of a class of tokens Lexemes Words in the the source program. Languages . - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/1.jpg)
by Neng-Fa Zhou
Lexical Analysis
Why separate lexical and syntax analyses?– simpler design– efficiency– portability
![Page 2: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/2.jpg)
by Neng-Fa Zhou
Tokens, Patterns, Lexemes– Tokens
• Terminal symbols in the grammar– Patterns
• Description of a class of tokens– Lexemes
• Words in the the source program
![Page 3: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/3.jpg)
by Neng-Fa Zhou
Languages
– Fixed and finite alphabet (vocabulary)– Finite length sentences– Possibly infinite number of sentences
Examples– Natural numbers {1,2,3,...10,11,...}– Strings over {a,b} anban
Terms on parts of a string– prefix, suffix, substring, proper ....
![Page 4: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/4.jpg)
by Neng-Fa Zhou
Operations on Languages
![Page 5: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/5.jpg)
by Neng-Fa Zhou
Examples
L = {A,B,...,Z,a,b,...,z}D = {0,1,...,9}
L D : the set of letters and digitsLD : a letter followed by a digitL4 : four-letter stringsL* : all strings of letters, including L(L D)* : strings of letters and digits beginning with a letterD+ : strings of one or more digits
![Page 6: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/6.jpg)
by Neng-Fa Zhou
Regular Expression(RE)
is a RE a symbol in is a RE Let r and s be REs.
– (r) | (s) : or– (r)(s) : concatenation– (r)* : zero or more instances– (r)+ : one or more instances– (r)? : zero or one instance
![Page 7: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/7.jpg)
by Neng-Fa Zhou
Precedence of Operators
high
low
r* r+ r?
rs
r|s
all left associative Examples
= {a,b}1. a|b2. (a|b)(a|b)3. a*4. (a|b)*5. a| a*b
![Page 8: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/8.jpg)
by Neng-Fa Zhou
Algebraic Properties of RE
![Page 9: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/9.jpg)
by Neng-Fa Zhou
d1 r1
d2 r2
dn rn
....di is a RE over {d1,d2,...,di-1}
Regular Definitions
not recursive
![Page 10: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/10.jpg)
Example-1
by Neng-Fa Zhou
%{ int num_lines = 0, num_chars = 0;%} %% \n ++num_lines; ++num_chars; . ++num_chars;
%%main(){ yylex(); printf( "# of lines = %d, # of chars = %d\n", num_lines, num_chars );}
yywrap(){return 0;}
![Page 11: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/11.jpg)
by Neng-Fa Zhou
Example-2D [0-9]INT {D}{D}*
%%{INT}("."{INT}((e|E)("+"|-)?{INT})?)? {printf("valid %s\n",yytext);}. {printf("unrecognized %s\n",yytext);}%%int main(int argc, char *argv[]){
++argv, --argc;if (argc>0) yyin = fopen(argv[0],"r"); else yyin = stdin;yylex();
}
yywrap(){return 0;}
![Page 12: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/12.jpg)
java.util.regex
by Neng-Fa Zhou
import java.util.regex.*;
class Number { public static void main(String[] args){
String regExNum = "\\d+(\\.\\d+((e|E)(\\+|-)?\\d+)?)?";if (Pattern.matches(regExNum,args[0])) System.out.println("valid");else System.out.println("invalid");
}}
![Page 13: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/13.jpg)
String Pattern Matching in Perl
by Neng-Fa Zhou
print "Input a string :";$_ = <STDIN>;chomp($_);if (/^[0-9]+(\.[0-9]+((e|E)(\+|-)?[0-9]+)?)?$/){ print "valid\n";} else { print "invalid\n"; }
![Page 14: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/14.jpg)
by Neng-Fa Zhou
Finite Automata
Nondeterministic finite automaton (NFA)
NFA = (S,T,s0,F)
– S: a set of states– T: a transition mapping– s0: the start state– F: final states or accepting states
![Page 15: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/15.jpg)
by Neng-Fa Zhou
Example
![Page 16: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/16.jpg)
by Neng-Fa Zhou
Deterministic Finite Automata (DFA)
T: a transition function There is only one arc going out from each node on each symbol.
![Page 17: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/17.jpg)
by Neng-Fa Zhou
Simulating a DFA
s = s0;c = nextchar;while (c != eof) {
s = move(s,c);c = nextchar;
}if (s is in F)
return "yes";else
return "no";
![Page 18: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/18.jpg)
by Neng-Fa Zhou
From RE to NFA
– a in
– s|t
![Page 19: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/19.jpg)
by Neng-Fa Zhou
From RE to NFA (cont.)
– st
– s*
![Page 20: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/20.jpg)
by Neng-Fa Zhou
Example(a|b)*a
![Page 21: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/21.jpg)
by Neng-Fa Zhou
Building Lexical Analyzer
RE NFA DFA
Emulator
Algorithm 3.23(Thompson's construction)
Algorithm 3.32(Subset construction)
![Page 22: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/22.jpg)
by Neng-Fa Zhou
Conversion of an NFA into a DFA Intuition
– move(s,a) is a function in a DFA– move(s,a) is a mapping in a NFA
NFA DFA
A state reachable from s0 in the DFA on an input string corresponds to a set of states in NFA that are reachable on the same string.
![Page 23: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/23.jpg)
by Neng-Fa Zhou
Computation of -Closure
-Closure(T): Set of NFA states reachable from some NFA state s in T by transition alone.
![Page 24: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/24.jpg)
by Neng-Fa Zhou
From an NFA to a DFA(The subset construction)
![Page 25: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/25.jpg)
by Neng-Fa Zhou
Example
NFA
DFA
![Page 26: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/26.jpg)
by Neng-Fa Zhou
Algorithm 3.39
F, S-F};do begin for each group G in do begin
partition G into subgroups such that two states s and tof G are in the same subgroup iff for all input symbols a, s and t have transitions on a to states in the same group;
replace G in by the set of all subgroups formed; end if () return;; end;
![Page 27: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/27.jpg)
by Neng-Fa Zhou
Example
a b
AC B ACB B DD B EE B AC
![Page 28: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/28.jpg)
Construct a DFA Directly from a Regular Expression
by Neng-Fa Zhou
![Page 29: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/29.jpg)
by Neng-Fa Zhou
Implementation Issues
Input buffering– Read in characters one by one
• Unable to look ahead• Inefficient
– Read in a whole string and store it in memory• Requires a big buffer
– Buffer pairs
![Page 30: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/30.jpg)
by Neng-Fa Zhou
Buffer Pairs
![Page 31: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/31.jpg)
by Neng-Fa Zhou
Use Sentinels
![Page 32: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/32.jpg)
by Neng-Fa Zhou
Lexical Analyzer
![Page 33: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/33.jpg)
by Neng-Fa Zhou
Lex
A tool for automatically generating lexical analyzers
![Page 34: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/34.jpg)
by Neng-Fa Zhou
Lex Specifications
declarations%%
translation rules
%%auxiliary procedures
p1 {action1}p2 {action2}...pn {actionn}
![Page 35: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/35.jpg)
by Neng-Fa Zhou
Lex Regular Expressions
![Page 36: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/36.jpg)
by Neng-Fa Zhou
yylex()
yylex(){switch (pattern_match()){ case 1: {action1} case 2: {action2}
... case n: {actionn}
}}
![Page 37: Lexical Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681634a550346895dd3da92/html5/thumbnails/37.jpg)
by Neng-Fa Zhou
ExampleDIGIT [0-9]ID [a-z][a-z0-9]*%%{DIGIT}+ {printf("An integer:%s(%d)\n",yytext,atoi(yytext));}{DIGIT}+"."{DIGIT}* {printf("A float: %s (%g)\n",yytext,atof(yytext));}if|then|begin|end|procedure|function {printf("A keyword: %s\n",yytext);}{ID} {printf("An identifier %s\n",yytext);}"+"|"-"|"*"|"/" {printf("An operator %s\n",yytext);}"{"[^}\n]*"}" {/* eat up one-line comments */}[ \t\n]+ {/* eat up white space */}. {printf("Unrecognized character: %s\n", yytext);}%%int main(int argc, char *argv[]){
++argv, --argc;if (argc>0) yyin = fopen(argv[0],"r"); else yyin = stdin;yylex();
}