csc488/2107 winter 2019 — compilers & interpreterspdm/csc488/winter2019/lectures/week… ·...
TRANSCRIPT
![Page 1: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/1.jpg)
CSC488/2107 Winter 2019 — Compilers & Interpreters
https://www.cs.toronto.edu/~csc488h/
Peter [email protected]
![Page 2: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/2.jpg)
Agenda• Recognize, Analyze, Transform
• Lexical analysis
• Building lexical analyzers
![Page 3: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/3.jpg)
Recognize Analyze Transform
Frontend Backend
![Page 4: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/4.jpg)
Recognize• Lexical structure
• Syntactic structure
• Highly language/syntax specific
• Data flow:➥ Stream of Characters➥ Stream of Tokens➥ Parse Tree (Concrete Syntax)
![Page 5: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/5.jpg)
Analyze• Semantic meaning
• Less language specific
• Data flow:➥ Parse Tree➥ Abstract Syntax Tree (possibly with annotations and/or associated symbol tables)
![Page 6: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/6.jpg)
Transform (Lower)•Memory layout
•Optimization (optional)
• Code generation
• Very target specific
• Data flow:➥ Abstract Syntax Tree➥ Intermediate Languages/Representations (optional)➥ Target Machine Code
![Page 7: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/7.jpg)
Lexical Analysis
Syntax Analysis
Semantic Analysis
Code Generation
Source Code
Object Code
CharactersTokens
Parse tree
Intermediate Language
Machine code / Bytecode
![Page 8: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/8.jpg)
C pre-processor
![Page 9: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/9.jpg)
#include <stdio.h>
Pre-processed:
/* complete contents of stdio.h */
Post-processed:
![Page 10: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/10.jpg)
#define PI 3.1415 float pi = PI;
Pre-processed:
float pi = 3.1415;
Post-processed:
![Page 11: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/11.jpg)
Lexical Analysis
Syntax Analysis
Semantic Analysis
Code Generation
Source Code
Object Code
CharactersTokens
Parse tree
Intermediate Language
Machine code / Bytecode
![Page 12: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/12.jpg)
Pre-processor Lexical Analysis
Pre-processor Syntax Analysis
Pre-processor Engine
Source Code
CharactersTokens
Parse tree
Characters
C Lexical Analysis
C Syntax Analysis
Tokens
Parse tree
![Page 13: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/13.jpg)
Lexical Analysis
![Page 14: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/14.jpg)
Recognizing the textual building blocks
of source code
![Page 15: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/15.jpg)
A scanner or lexer converts a stream of characters
into a stream of lexical tokens
![Page 16: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/16.jpg)
Characters• Visual representation (human):
• ASCII characters or Unicode code points
• Physical byte representation:
• Fixed length: 7 bit ASCII, UCS-4
• Variable length: UTF-8/16/32
• Integers to the compiler
![Page 17: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/17.jpg)
Lexical Token•One of a fixed set of distinguishing categories:
• Identifiers
• Reserved identifiers / keywords
• Literal constants: numeric, string
• Special punctuaction (braces, symbols, etc.)
• Comments
• Language specific
![Page 18: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/18.jpg)
Scanner/Lexer•Consumes character input
•Identifies lexical boundaries
•Emits a stream of tokens
•Identifies malformed input and emits errors
•Chooses what to ignore (comments, whitespace)
•Manages additional bookkeeping like source coordinates (input filenames, line and column numbers)
![Page 19: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/19.jpg)
if x < y { v = 1 }
![Page 20: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/20.jpg)
if x < y { v = 1 }
![Page 21: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/21.jpg)
if x<y{v=1}
![Page 22: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/22.jpg)
if x < y { v = 1 }IF
IDENT xLT
IDENT yLBRACEIDENT v
EQINTEGER 1RBRACE
![Page 23: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/23.jpg)
Careful language design choices can
enable fast scanners
![Page 24: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/24.jpg)
Building lexical analyzers
![Page 25: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/25.jpg)
struct Token { enum { IF, LT, IDENT, LITERAL, … } type; union { char *ident; int literal; }; // more bookkeeping };
![Page 26: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/26.jpg)
data Token = If | Lt | Ident String | Literal Integer | …
![Page 27: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/27.jpg)
Idea: Use finite automata (state machines) to recognize tokens out of a stream of characters
![Page 28: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/28.jpg)
Example: Addition expressions
![Page 29: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/29.jpg)
Example expressions
1 123+456
1+2+3+456
![Page 30: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/30.jpg)
Lexical structure• 2 token types
•Plus
•Positive integer literal
•No whitespace handling
![Page 31: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/31.jpg)
Σ — Vocabulary
Σ = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, + }
![Page 32: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/32.jpg)
Represent finite automata (state machines) using a state transition diagrams
![Page 33: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/33.jpg)
State transition diagram: Plus
Start Emit Plus
+
![Page 34: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/34.jpg)
State transition diagram: Positive integer literals
Start XEmit
Literal
1…9
0…9
…else…
![Page 35: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/35.jpg)
Start X Emit Literal
1…9
0…9
…else…
Start Emit Plus
+
![Page 36: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/36.jpg)
X
Emit Literal
1…9
0…9
Start
Emit Plus
+
λ
λ
Non-deterministic finite automata (NFA)
![Page 37: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/37.jpg)
XEmit
Literal1…9
0…9
Start
Emit Plus
+
Deterministic finite automata (DFA)
+
![Page 38: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/38.jpg)
Table driven DFA
Input \
State+ 0 1 2 3 4 5 6 7 8 9 Action
S T error U U U U U U U U U
T S+ Emit Plus
U T U+ U+ U+ U+ U+ U+ U+ U+ U+ U+ Emit Literal
Notation: V to change state, V+ to change while consuming 1 input character
![Page 39: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/39.jpg)
while True: c = curInput() if state is START: if c == '+': emitPlus() nextInput() elif c in digits19: save(c) state = LITERAL nextInput() else: error() elif state is LITERAL: if c in digits09: save(c) nextInput() else: emitLiteral(getSaved()) resetSaved() state = START if c is EOF: break
![Page 40: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/40.jpg)
Regular Expressions• A regular expression is a rigorous mathematic statement
defining the members of a regular set
• Very compact means of specifying the structure of lexical tokens
![Page 41: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/41.jpg)
Notation & Definitions
![Page 42: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/42.jpg)
Let Ø be the empty set
![Page 43: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/43.jpg)
Let Σ be a finite set of distinguished characters
(the vocabulary)
May use quote marks to avoid confusion: Σ = { ‘{‘, ‘}’, ‘,’ }
![Page 44: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/44.jpg)
A string is defined inductively by cases:
1. The empty or null string, denoted λ
•Ø ≠ λ
2. A character from Σ is itself a string
3. The concatenation of two strings is a string
• For any strings S and T, both S T and T S are strings
• For any string S, λ S = S λ = S
![Page 45: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/45.jpg)
Ø is also a regular expression denoting the empty set
![Page 46: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/46.jpg)
Any string S is a regular expression, denoting the set containing that
string
![Page 47: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/47.jpg)
Forming regular expressionsFor any two regular expressions A and B, the following are also regular expressions:
1. Alternation: A | B
• Set union
2. Concatenation: A B
• Set of all strings formed by the concatenation of any string from A and any string from B
3. Kleene Closure: A*
• Zero or more concatenations of A
4. Parenthesis: ( A )
• Disambiguation
![Page 48: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/48.jpg)
Useful shorthands1. Positive Closure: A+
• A A* (one or more concatenations)
2. Optional: A?
• A | λ (zero or one A)
3. Complement: Not(A)
• Match anything from Σ that does not match A
4. Character ranges: [ “A” … “Z” ]
• “A” | “B” | … | “Y” | “Z”
• When it’s clear what the “…” ranges over
![Page 49: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/49.jpg)
Examples
![Page 50: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/50.jpg)
Addition expressions tokens as regular expressions
Σ = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, + }
Digits19 = ( 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 )Digits = ( 0 | Digits19 )Plus = ( + )Literal = ( Digits19 Digits* )
Token = Plus | Literal
![Page 51: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/51.jpg)
More examples (1)
Digit = “0” … “9” Letter = “a” … “z” | “A” … “Z” Identifier = ( Letter | “_” ) ( Letter | Digit | “_”)*
![Page 52: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/52.jpg)
More examples (2)
Digit1 = “1” … “9” Digit = “0” | Digit1 HexDigit = Digit | “a” … “f” | “A” … “F” DecLiteral = Digit1 Digit* HexLiteral = “0” ( “x” | “X” ) HexDigit+ Literal = (“-“)? DecLiteral | HexLiteral
![Page 53: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/53.jpg)
More examples (3)
EOL = ‘\r’ | ‘\n’ | ‘\r’ ‘\n’ PythonComment = ‘#’ Not(EOL)* EOL CComment = ‘/‘ ‘*’ Not(‘*’ ‘/‘)* ‘*’ ‘/‘
![Page 54: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/54.jpg)
Great… but can we use them in a lexer?
![Page 55: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/55.jpg)
By Thompson’s construction, a regular expression can always
be converted into an NFA
![Page 56: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/56.jpg)
NFA’s are equivalent to DFA’s
It’s always possible to convert an NFA into a DFA
![Page 57: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/57.jpg)
DFA scanners can be implemented extremely
efficiently
See re2c for an even faster, code generation approach
![Page 58: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/58.jpg)
Scanner development options
• Write by hand
• Use scanner generator tools
• Provide a specially formatted definition file containing regular expressions and code fragments
• Generates source code implementing the scanner
• GNU Flex, ANTLR, PLY (Python Lex-Yacc), Ragel, re2c
• Use built-in language support for regular expressions
![Page 59: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/59.jpg)
scanner.py
![Page 60: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/60.jpg)
if x < y { v = 1 }IF
IDENT xLT
IDENT yLBRACEIDENT v
EQINTEGER 1RBRACE
![Page 61: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/61.jpg)
import re
SPEC = r''' (?P<IDENT> [_a-zA-Z] [_a-zA-Z0-9]* ) | (?P<NUMBER> [-]? [1-9] [0-9]* ) | (?P<LT> < ) | (?P<EQ> = ) | (?P<LBRACE> { ) | (?P<RBRACE> } ) | (?P<WS> \s+ ) '''
lex = re.compile(SPEC, re.VERBOSE).match
![Page 62: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/62.jpg)
Next Week• Syntax analysis & parsing
![Page 63: CSC488/2107 Winter 2019 — Compilers & Interpreterspdm/csc488/winter2019/lectures/week… · Scanner development options • Write by hand • Use scanner generator tools • Provide](https://reader033.vdocuments.us/reader033/viewer/2022042215/5ebb9949223f92403901ad32/html5/thumbnails/63.jpg)
No tutorial on Tuesday Jan 22