compiler construction dr. naveed ejaz lecture 5. lexical analysis
TRANSCRIPT
![Page 1: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/1.jpg)
Compiler Compiler ConstructionConstruction
Compiler Compiler ConstructionConstruction
Dr. Naveed Ejaz
Lecture 5
![Page 2: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/2.jpg)
Lexical AnalysisLexical AnalysisLexical AnalysisLexical Analysis
![Page 3: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/3.jpg)
3
Recall: Front-EndRecall: Front-EndRecall: Front-EndRecall: Front-End
Output of lexical analysis is a stream of tokens
scanner parsersourcecode
tokens IR
errors
![Page 4: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/4.jpg)
4
TokensTokensTokensTokensExample:
if( i == j )
z = 0;
else
z = 1;
![Page 5: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/5.jpg)
5
TokensTokensTokensTokens Input is just a sequence of
characters:
if ( \b i \b = = \b j \n \t ....
![Page 6: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/6.jpg)
6
TokensTokensTokensTokens
Goal: partition input string into
substrings classify them according to
their role
![Page 7: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/7.jpg)
7
TokensTokensTokensTokens A token is a syntactic
category
Natural language: “He wrote the program”
Words: “He”, “wrote”, “the”, “program”
![Page 8: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/8.jpg)
8
TokensTokensTokensTokens Programming language:
“if(b == 0) a = b” Words:
“if”, “(”, “b”, “==”, “0”, “)”, “a”, “=”, “b”
![Page 9: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/9.jpg)
9
TokensTokensTokensTokens Identifiers: x y11 maxsize Keywords: if else while for Integers: 2 1000 -44 5L Floats: 2.0 0.0034 1e5 Symbols: ( ) + * / { } < > == Strings: “enter x” “error”
![Page 10: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/10.jpg)
10
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexer Hand-write code to generate
tokens. Partition the input string by
reading left-to-right, recognizing one token at a time
![Page 11: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/11.jpg)
11
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexer Look-ahead required to
decide where one token ends and the next token begins.
![Page 12: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/12.jpg)
12
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexerclass Lexer { Inputstream s; char next;//look ahead Lexer(Inputstream _s) { s = _s;
next = s.read(); }
![Page 13: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/13.jpg)
13
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexerclass Lexer { Inputstream s; char next;//look ahead Lexer(Inputstream _s) { s = _s;
next = s.read(); }
![Page 14: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/14.jpg)
14
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexerclass Lexer { Inputstream s; char next;//look ahead Lexer(Inputstream _s) { s = _s;
next = s.read(); }
![Page 15: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/15.jpg)
15
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexerclass Lexer { Inputstream s; char next;//look ahead Lexer(Inputstream _s) { s = _s;
next = s.read(); }
![Page 16: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/16.jpg)
16
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexerclass Lexer { Inputstream s; char next;//look ahead Lexer(Inputstream _s) { s = _s;
next = s.read(); }
![Page 17: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/17.jpg)
17
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken nextToken() { if( idChar(next) ) return readId(); if( number(next) ) return readNumber(); if( next == ‘”’ ) return readString(); ... ...
![Page 18: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/18.jpg)
18
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken nextToken() { if( idChar(next) ) return readId(); if( number(next) ) return readNumber(); if( next == ‘”’ ) return readString(); ... ...
![Page 19: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/19.jpg)
19
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken nextToken() { if( idChar(next) ) return readId(); if( number(next) ) return readNumber(); if( next == ‘”’ ) return readString(); ... ...
![Page 20: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/20.jpg)
20
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken nextToken() { if( idChar(next) ) return readId(); if( number(next) ) return readNumber(); if( next == ‘”’ ) return readString(); ... ...
![Page 21: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/21.jpg)
21
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);
id = id + string(c); }}
![Page 22: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/22.jpg)
22
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);
id = id + string(c); }}
![Page 23: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/23.jpg)
23
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);
id = id + string(c); }}
![Page 24: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/24.jpg)
24
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);
id = id + string(c); }}
![Page 25: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/25.jpg)
25
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);
id = id + string(c); }}
![Page 26: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/26.jpg)
26
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);
id = id + string(c); }}
![Page 27: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/27.jpg)
27
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);
id = id + string(c); }}
![Page 28: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/28.jpg)
28
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexerboolean idChar(char c){if( isAlpha(c) ) return true;if( isDigit(c) ) return true;if( c == ‘_’ ) return true;
return false;}
![Page 29: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/29.jpg)
29
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readNumber(){ string num = “”; while(true){ next = input.read(); if( !isNumber(next)) return new Token(TNUM,num);
num = num+string(next); }}
![Page 30: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/30.jpg)
30
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readNumber(){ string num = “”; while(true){ next = input.read(); if( !isNumber(next)) return new Token(TNUM,num);
num = num+string(next); }}
![Page 31: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/31.jpg)
31
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readNumber(){ string num = “”; while(true){ next = input.read(); if( !isNumber(next)) return new Token(TNUM,num);
num = num+string(next); }}
![Page 32: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/32.jpg)
32
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerProblems: Do not know what kind of
token we are going to read from seeing first character.
![Page 33: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/33.jpg)
33
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerProblems: If token begins with “i”, is it
an identifier “i” or keyword “if”?
If token begins with “=”, is it “=” or “==”?
![Page 34: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c518a6/html5/thumbnails/34.jpg)
34
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexer Need a more principled
approach Use lexer generator that
generates efficient tokenizer automatically.