chapter 2 chang chi-chung 2007.3.15. lexical analyzer the tasks of the lexical analyzer: remove...
Post on 20-Dec-2015
229 views
TRANSCRIPT
![Page 1: Chapter 2 Chang Chi-Chung 2007.3.15. Lexical Analyzer The tasks of the lexical analyzer: Remove white space and comments Encode constants as tokens](https://reader035.vdocuments.us/reader035/viewer/2022062304/56649d4e5503460f94a2db48/html5/thumbnails/1.jpg)
Chapter 2
Chang Chi-Chung
2007.3.15
![Page 2: Chapter 2 Chang Chi-Chung 2007.3.15. Lexical Analyzer The tasks of the lexical analyzer: Remove white space and comments Encode constants as tokens](https://reader035.vdocuments.us/reader035/viewer/2022062304/56649d4e5503460f94a2db48/html5/thumbnails/2.jpg)
Lexical Analyzer
The tasks of the lexical analyzer: Remove white space and comments Encode constants as tokens Recognize Keywords and Identifiers Store identifier names in a symbol table.
![Page 3: Chapter 2 Chang Chi-Chung 2007.3.15. Lexical Analyzer The tasks of the lexical analyzer: Remove white space and comments Encode constants as tokens](https://reader035.vdocuments.us/reader035/viewer/2022062304/56649d4e5503460f94a2db48/html5/thumbnails/3.jpg)
Lexical Analyzer
Lexical analyzerLexer()
<if> <(> <id, “peek”> <eq> <const, ‘\n’> <)> <id, “line”><assign> <id, “line”> <+> <num, 1> <;>
if (peek == ‘\n’) line = line +1
Parser or Syntax-Directed TranslatorParser()
token token Attribute
![Page 4: Chapter 2 Chang Chi-Chung 2007.3.15. Lexical Analyzer The tasks of the lexical analyzer: Remove white space and comments Encode constants as tokens](https://reader035.vdocuments.us/reader035/viewer/2022062304/56649d4e5503460f94a2db48/html5/thumbnails/4.jpg)
Remove white space and comments
for ( ; ; peek = next character ){ if ( peek is a blank or a tab ) do nothing; else if (peek is a newline) line = line + 1; else break;}
For white spaces and comments Eliminated by the lexical analyzer. Modifying the grammar to incorporate it into the
syntax. ( not easy )
![Page 5: Chapter 2 Chang Chi-Chung 2007.3.15. Lexical Analyzer The tasks of the lexical analyzer: Remove white space and comments Encode constants as tokens](https://reader035.vdocuments.us/reader035/viewer/2022062304/56649d4e5503460f94a2db48/html5/thumbnails/5.jpg)
Encode constants as tokens For a sequence of digits, the lexical analyzer must
pass to the parser a token. The token consists of the terminal along with an integer-
valued attribute computed from the digits. Example
31 + 28 + 29 <num, 31> <+> <num, 28> <+> <num,29>
if ( peek holds a digit ) {
v = 0;
do {
v = v * 10 + integer value of digit peek;
peek = next input character;
} while (peek holds a digit)
return token <num, v>;
}
![Page 6: Chapter 2 Chang Chi-Chung 2007.3.15. Lexical Analyzer The tasks of the lexical analyzer: Remove white space and comments Encode constants as tokens](https://reader035.vdocuments.us/reader035/viewer/2022062304/56649d4e5503460f94a2db48/html5/thumbnails/6.jpg)
Recognize Keywords and Identifiers Keyword
A fixed character string as punctuation marks or to identify constructs.
Example for、 while、 if
Identifier Use to name variables, arrays, functions, and the like. Parser treat identifiers as terminals. Example
count = count + increment; <id, ”count”> = <id, “count”> <+> <id, “increment”> <;>
![Page 7: Chapter 2 Chang Chi-Chung 2007.3.15. Lexical Analyzer The tasks of the lexical analyzer: Remove white space and comments Encode constants as tokens](https://reader035.vdocuments.us/reader035/viewer/2022062304/56649d4e5503460f94a2db48/html5/thumbnails/7.jpg)
Recognize Keywords and Identifiers The lexical analyzer uses a table to hold character
strings. A string table can be implemented by a hash table. Single Representation Reserved Words.
if ( peek holds a letter ) {
collect letters or digits into a buffer b;
s = string formed from the characters in b;
w = token returned by words.get(s);
if (w is not null) return w;
else {
Enter the key-value pair (s, <id, s>) into words
return token <id, s>;
}
}
![Page 8: Chapter 2 Chang Chi-Chung 2007.3.15. Lexical Analyzer The tasks of the lexical analyzer: Remove white space and comments Encode constants as tokens](https://reader035.vdocuments.us/reader035/viewer/2022062304/56649d4e5503460f94a2db48/html5/thumbnails/8.jpg)
Create a Lexical AnalyzerToken scan() {
skip white space. (A) handle numbers. (B) handle reserved words and identifiers. (C) Token t = new Token(peek);
peek = blank; (D) return t;}
![Page 9: Chapter 2 Chang Chi-Chung 2007.3.15. Lexical Analyzer The tasks of the lexical analyzer: Remove white space and comments Encode constants as tokens](https://reader035.vdocuments.us/reader035/viewer/2022062304/56649d4e5503460f94a2db48/html5/thumbnails/9.jpg)
Complete Lexical Analyzer (1)
package lexer;public class Token { public final int tag; public Token(int t) { tag = t; }}
public class Tag { public final static int NUM = 256, ID = 257, TRUE = 258, FALSE = 259;}
public class Num extends Token { public final int value; public Num(int v) { super(Tag.NUM); value = v; }}
public class Word extends Token { public final String lexeme; public Word(int t, String s) { super(t); lexeme = new String(s); }}
class Token+int tag
class Num+int value
class Word+string lexeme
![Page 10: Chapter 2 Chang Chi-Chung 2007.3.15. Lexical Analyzer The tasks of the lexical analyzer: Remove white space and comments Encode constants as tokens](https://reader035.vdocuments.us/reader035/viewer/2022062304/56649d4e5503460f94a2db48/html5/thumbnails/10.jpg)
Complete Lexical Analyzer (2)package lexer;import java.io.*;import java.util.*;
public class Lexer { public int line = 1; private char peek = ' '; private Hashtable words = new Hashtable();
void reserve(Word t) { words.put(t.lexeme, t); }
public Lexer() { reserve( new Word(Tag.TRUE, "true") ); reserve( new Word(Tag.FALSE, "false") ); }
![Page 11: Chapter 2 Chang Chi-Chung 2007.3.15. Lexical Analyzer The tasks of the lexical analyzer: Remove white space and comments Encode constants as tokens](https://reader035.vdocuments.us/reader035/viewer/2022062304/56649d4e5503460f94a2db48/html5/thumbnails/11.jpg)
Complete Lexical Analyzer (3)public Token scan() throws IOException { for ( ; ; peek = (char) System.in.read() ) { if ( peek == ' ' || peek == '\t' ) continue; else if ( peek == '\n' ) line = line + 1; else break; } if ( Character.isDigit(peek) ) { int v = 0; do { v = v * 10 + Character.digit(peek, 10); peek = (char) System.in.read(); } while ( Character.isDigit(peek) ) return new Num(v); } }}
C
D
![Page 12: Chapter 2 Chang Chi-Chung 2007.3.15. Lexical Analyzer The tasks of the lexical analyzer: Remove white space and comments Encode constants as tokens](https://reader035.vdocuments.us/reader035/viewer/2022062304/56649d4e5503460f94a2db48/html5/thumbnails/12.jpg)
Complete Lexical Analyzer (4)public Token scan() throws IOException {
if ( Character.isLetter(peek) ) { StringBuffer b = new StringBuffer(); do { b.append(peek); peek = (char) System.in.read() } while ( Character.isLetterOrDigit(peek) ); String s = b.toString(); Word w = (Word) words.get(s); if (w != null) return w; w = new Word(Tag.ID, s); words.put(s, w); return w; } Token t = new Token(peek); peek = ' '; return t; }}
A
B
![Page 13: Chapter 2 Chang Chi-Chung 2007.3.15. Lexical Analyzer The tasks of the lexical analyzer: Remove white space and comments Encode constants as tokens](https://reader035.vdocuments.us/reader035/viewer/2022062304/56649d4e5503460f94a2db48/html5/thumbnails/13.jpg)
Symbol Tables
Symbol tables are data structures Used by compilers to hold information about
source-program constructs. Scope of identifier x
The scope of a particular declaration x Scope
A portion of a program that is the scope of one or more declaration.
![Page 14: Chapter 2 Chang Chi-Chung 2007.3.15. Lexical Analyzer The tasks of the lexical analyzer: Remove white space and comments Encode constants as tokens](https://reader035.vdocuments.us/reader035/viewer/2022062304/56649d4e5503460f94a2db48/html5/thumbnails/14.jpg)
Symbol Tables
w
x int
y int
w int
y bool
z int
B0
B1
B3
{
int x1, int y1;
{
int w2; bool y2; int z2;
w2; x1; y2; z2;
}
w0; x1; y1;
}
![Page 15: Chapter 2 Chang Chi-Chung 2007.3.15. Lexical Analyzer The tasks of the lexical analyzer: Remove white space and comments Encode constants as tokens](https://reader035.vdocuments.us/reader035/viewer/2022062304/56649d4e5503460f94a2db48/html5/thumbnails/15.jpg)
Symbol Tablespackage symbols;import java.util.*;public class Env {
private Hashtable table;protected Env prev;
public Env(Env p) {table = new Hashtable();prev = p;
}
public void put(String s, Symbol sym) {table.put(s, sym);
}
public Symbol get(String s) {for (Env e = this; e != null; e = e.prev) { Symbol found = (Symbol)(e.table.get(s));
if (found != null) return found;}return null;
}}
w
x inty int
w inty bool
z int
B0
B1
B3
![Page 16: Chapter 2 Chang Chi-Chung 2007.3.15. Lexical Analyzer The tasks of the lexical analyzer: Remove white space and comments Encode constants as tokens](https://reader035.vdocuments.us/reader035/viewer/2022062304/56649d4e5503460f94a2db48/html5/thumbnails/16.jpg)
The Use of Symbol Tablesprogram → block { top = null; } block → ‘{‘ { saved = top; top = new Env(top); print(“{ “); } decls stmts ‘}’ { top = saved; print(“} “); }
decls → decls decl | ε
decl → type id ; { s = new Symbol; s.type = type.lexeme; top.put(id.lexeme, s); } stmts → stmts stmt | ε
stmt → block | factor ; { print(“; “); }
factor → id { s = top.get(id.lexeme); print(id.lexeme); print(“:”); print(s.type); }
![Page 17: Chapter 2 Chang Chi-Chung 2007.3.15. Lexical Analyzer The tasks of the lexical analyzer: Remove white space and comments Encode constants as tokens](https://reader035.vdocuments.us/reader035/viewer/2022062304/56649d4e5503460f94a2db48/html5/thumbnails/17.jpg)
Intermediate Code Generation Two most important intermediate representations.
Trees Parse trees, syntax trees (abstract trees) Example
while ( expr ) stmt op: while E1 : expr E2 : stmt
Linear representations Three-address code Example
ifFalse x goto L ifTrue x goto L goto L x [ y ] = z x = y [ z ]
op
E1 E2
x = y op z
![Page 18: Chapter 2 Chang Chi-Chung 2007.3.15. Lexical Analyzer The tasks of the lexical analyzer: Remove white space and comments Encode constants as tokens](https://reader035.vdocuments.us/reader035/viewer/2022062304/56649d4e5503460f94a2db48/html5/thumbnails/18.jpg)
Intermediate Code Generation
Parser or Syntax-Directed TranslatorParser()
If
eq
peek
assign
(int)
‘\n’
line
line 1
+
1: t1 = (int) ‘\n’2: ifFalse peek == t1 goto 43: line = line + 14:
or
if (peek == ‘\n’) line = line +1
![Page 19: Chapter 2 Chang Chi-Chung 2007.3.15. Lexical Analyzer The tasks of the lexical analyzer: Remove white space and comments Encode constants as tokens](https://reader035.vdocuments.us/reader035/viewer/2022062304/56649d4e5503460f94a2db48/html5/thumbnails/19.jpg)
Syntax TreesConcrete Syntax Abstract Syntax
=
||
&&
== !=
< <= >= >
+ -
* / %
!
-unary
[ ]
assign
cond
cond
rel
rel
op
op
not
minus
access
![Page 20: Chapter 2 Chang Chi-Chung 2007.3.15. Lexical Analyzer The tasks of the lexical analyzer: Remove white space and comments Encode constants as tokens](https://reader035.vdocuments.us/reader035/viewer/2022062304/56649d4e5503460f94a2db48/html5/thumbnails/20.jpg)
Syntax Treesseq
seq
seq if
while
null sometree
for anexpression
sometree
for anexpression
sometree
for anexpression
sometree
for anexpression
![Page 21: Chapter 2 Chang Chi-Chung 2007.3.15. Lexical Analyzer The tasks of the lexical analyzer: Remove white space and comments Encode constants as tokens](https://reader035.vdocuments.us/reader035/viewer/2022062304/56649d4e5503460f94a2db48/html5/thumbnails/21.jpg)
Static Checking Static checks are consistency checks that are
done during compilation. Syntactic Checking Type Checking
L-values and R-values i = 5 i = i + 1