lecture compiler construction - graz university of technology · concept of compilation –...
TRANSCRIPT
![Page 1: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/1.jpg)
Lecture Compiler Construction
Franz [email protected]
Institute for Software TechnologyTechnische Universitat Graz
Inffeldgasse 16b/2, A-8010 Graz, Austria
Summer term 2016
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 1 / 309
![Page 2: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/2.jpg)
Compilers are everywhere
Programming languages like Java, C#, C, C++, Pascal, Modula,SML, Lisp, VHDL, Basic,..Graphical languages also need compilers (HTML, LATEX,...)Means for communication between computers like XMLNatural languages
Compilers translate a sentence written in one language to anotherlanguage.
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 2 / 309
![Page 3: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/3.jpg)
Example – HTML
<!DOCTYPE HTML PUBLIC "...- > <html> <head> <metacontent=text/html; ..."http-equiv=Content-Type- ><title>Hello World</title> </head> <body> <h1>HelloWorld!</h1> </body> </html>
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 3 / 309
![Page 4: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/4.jpg)
Another example
(Source: en.wikipedia.org/wiki/Java bytecode)
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 4 / 309
![Page 5: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/5.jpg)
Compiler vs. Interpreter
Compiler: Translates from one into another language whereprograms are directly executed (C,C++,Pascal, ...).
Interpreter: Executes a program directly without converting it (BASIC,batch languages,..).
Exact boundary difficult to define nowadays (Byte codeinterpreter vs. CPU, which executes machine codestatements.Front end (analysis phase) of compilers is always needed!
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 5 / 309
![Page 6: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/6.jpg)
Organizational issues
Lecture (Vorlesung) 2hPractical exercises (Ubung) 1h
Examples from the Compiler Construction theoryHands on part: Development of a compilerMore information soon (via email and/or the webpage)
Office hours:Franz Wotawa: Tuesday, 13:00–14:00Roxane Koitz: Monday, 13:00–14:00
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 6 / 309
![Page 7: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/7.jpg)
Lecture dates
Monday, 29.2.2016, 16:00 - 19:00 (preliminary discussion, lexical analysis)
Monday, 7.3.2016, 16:00 - 19:00 (lexical analysis, parsing)
Tuesday, 8.3.2016, 16:00 - 17:30 (parsing, attributed grammars)
Monday, 14.3.2016, 16:00 - 19:00 (attributed grammars, type checking)
Tuesday, 15.3.2016, 16:00 - 17:30 (type checking, runtime environment)
Monday, 11.4.2016, 16:00 - 19:00 (code generation)
Monday, 18.4.2016, 16:00 - 19:00 (code optimization)
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 7 / 309
![Page 8: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/8.jpg)
Exam
Written form only; no papers, books etc. allowed!Content: DFA, NFA, LL(1), SLR(1), Attributed Grammars, TypeChecking, Code Generation, Code Optimization, etc.)1. date: Monday, 9.5.2016, 18:00-20:00, i13, (2 groups, each 1hour)2. date: Monday, 13.6.2016, 18:00-20:00, i13, (2 groups, each 1hour)3. date: in October 2016, will be announced soon
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 8 / 309
![Page 9: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/9.jpg)
Literature, etc.
Aho, Seti, Ullman, Compilers Principles, Techniques, andTools, Addison-Wesley, 1985, ISBN 0-201-10088-6.Tremblay, Sorenson, The Theory and Practice of Compiler Writing,McGraw Hill, 1985, ISBN 0-07-065161-2.Copy of slides but no lecture notesCompiler construction webpage:http://www.ist.tugraz.at/cb16.html
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 9 / 309
![Page 10: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/10.jpg)
PART 1 - INTRODUCTION
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 10 / 309
![Page 11: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/11.jpg)
What is a compiler?
ProgramSource language⇒ Target languageError reports if source code contains errors
Source program −→ Compiler −→ Target program↓
Error messages
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 11 / 309
![Page 12: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/12.jpg)
Implications of definition
There are (very) many compilersThousands of source languages (e.g. Pascal, Modula, C, C++,Java, . . . )Thousands of target languages (other high-level programminglanguages, machine code)Classification: single-pass, multi-pass, debugging, or optimizingBut: basics of creating a compiler are mostly consistent
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 12 / 309
![Page 13: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/13.jpg)
Concept of compilation
Analysis phaseSplit source program into tokensGenerate intermediate code (intermediate representation of sourceprogram)
Synthesis phaseGenerate target program based on intermediate code
Enables reuse of program parts for different source and targetlanguages
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 13 / 309
![Page 14: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/14.jpg)
Concept of compilation – Analysis
Operations used within program are stored in hierarchicalstructureSyntax TreeOther tools which perform an analysis:
Structure editorsPretty printersStatic checkersInterpretersWeb browser
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 14 / 309
![Page 15: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/15.jpg)
Language-processing systems
skeletal source program↓
preprocessor↓
source program↓
compiler↓
target assembly language↓
assembler↓
relocatable machine code
↓relocatable machine code
↓
loader/link-editor ←−library,relocatableobject files
↓absolute machine code
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 15 / 309
![Page 16: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/16.jpg)
Components of a compiler
lexical analyser
semantic analyser
syntax analyser
intermediate code
generator
code optimizer
code generator
error handler
manager
symbol-table
source code
target program
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 16 / 309
![Page 17: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/17.jpg)
Lexical analysis, scanning
Example:pos := init + rate * 60
Tokens:1 Identifier pos2 Assignment symbol :=3 Identifier init4 Plusoperator5 Identifier rate6 Multiplication operator7 Constant (number) 60
Syntax tree
:=
pos+
init *
rate 60
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 17 / 309
![Page 18: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/18.jpg)
Syntax analysis, parsing
Grammatical analysisToken↔ rules of grammarRules of grammar (example)
1 Each identifier is an expression2 Each number is an expression3 Assuming that ex1 and ex2 are expressions, then ex1 + ex2 andex1 ∗ ex2 are expressions as well
4 A term: identifier := Expression is a statement
Parse tree
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 18 / 309
![Page 19: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/19.jpg)
Example - Parse tree
assignment
statement
:=identifier expression
pos
expression +
identifier
init
expression
expression
identifier
rate
*
60
expression
number
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 19 / 309
![Page 20: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/20.jpg)
Semantic analysis
Check for semantic errorsType checkingIdentification of operators and operandsNecessary for code generationExample: conversion from int to real
statementpos := init + rate * 60
becomespos := init + rate * int2real(60)
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 20 / 309
![Page 21: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/21.jpg)
Intermediate code generation
E.g.: three-address codeExample:1. tmp1 := int2real(60)2. tmp2 := id3 * tmp13. tmp3 := id2 + tmp24. id1 := tmp3
using the symbol table
pos (id1) real . . .init (id2) real . . .rate (id3) real . . .
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 21 / 309
![Page 22: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/22.jpg)
Code optimizer
Goal: faster machine codeExample:
1. tmp1 := int2real(60)2. tmp2 := id3 * tmp13. tmp3 := id2 + tmp24. id1 := tmp3
⇓1. tmp1 := id3 * 60.02. id1 := id2 + tmp1
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 22 / 309
![Page 23: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/23.jpg)
Code generator
Generation of actual target code (Machine code, Assembler)Example:
1. MOVF id3, R22. MULF #60.0, R23. MOVF id2, R14. ADDF R2, R15. MOVF R1, id1
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 23 / 309
![Page 24: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/24.jpg)
PART 2 - LEXICAL ANALYSIS
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 24 / 309
![Page 25: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/25.jpg)
Goals of this section
Specification and implementation of lexical analysersEasiest method:
1 Create diagram which describes structure of tokens2 Manually translate diagram into a program
Regular expressions in finite state machines
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 25 / 309
![Page 26: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/26.jpg)
Tasks
get next token
token
source program analyser
lexicalparser
symbol
table
Filter comments, white spaces, ..Establish relations between error messages of compiler andsource code (e.g. via line number)Preprocessor functions (e.g. in C)Create copy of source program containing error messages
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 26 / 309
![Page 27: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/27.jpg)
Why separate lexical analysis and parsing?
1 Simpler design: e.g. removal of comments and white spacesenables use of simpler parser design
2 Improve efficiency: large portion of time is invested into reading ofprograms and conversion into tokens
3 Enhance portability of compilers4 ( There are tools for both phases )
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 27 / 309
![Page 28: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/28.jpg)
Tokens, patterns, lexemes
Pattern: Rule describing a set of strings (a token)Token: Set of strings described by a pattern
Lexem: Sequence of characters which are matched by a patternof a token
Token Lexem Patterns (verbal)if if ifrelation >, <, . . . < or > or . . .id pi, test cases letter followed by letters, digits, or ’ ’num 2.7, 0 , 12e-4 any numeric constantliteral “segmentation fault “ any characters between “ and “
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 28 / 309
![Page 29: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/29.jpg)
Tokens
Terminals in the grammar (checked by the parser)In programming languages:keywords, operators, identifiers, constants, strings, brackets
Language conventions:1 Tokens have to occur in specific places within the code2 Tokens are separated by white spaces
Example (Fortran): DO 5 I = 1.25 is interpreted as DO5I =1.25
3 Keywords are reserved and must not be used as identifiers1. is (currently) not used2. is accepted3. is reasonable (and simplyfies compiler construction)
IF THEN THEN THEN = ELSE; ELSE ELSE = THEN;
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 29 / 309
![Page 30: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/30.jpg)
Token attributes
Store lexeme for further processingStore pointers to entries in symbol tableLine number or position of token within source code
U = 2 * R * PI
< id, pointer to U >< assign op >< num, integer value 2 >< mult op >< id, pointer to R >< mult op >< id, pointer to PI >
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 30 / 309
![Page 31: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/31.jpg)
Lexical errors
fi ( a == 1) ...⇒ fi is interpreted as undefined identifierString sequences which cannot be matched to a tokene.g.: 2.3e*4 instead of 2.3e+4.Error-correcting:
1 Panic Mode: Removal of faulty characters until a token can bematched
2 Remove a faulty character3 Include a new character4 Replace a character5 Change order of neighboring characters
Some errors (like the first example) may be only recognizable bythe parser
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 31 / 309
![Page 32: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/32.jpg)
Specification of tokens
Strings over an alphabet:Alphabet . . . finite set of symbols example: {0, 1} Binary alphabetString (word, sentence) . . . finite sequence of symbols of the alphabetε . . . empty string
Language: Set of all strings over an alphabetOperators:
Concatenation xy is a string whereby y is attached to the end of xxi is defined as: x0 = ε, i > 0→ xi = xi−1xPrefix: computer→ compSuffix: computer→ uterSubstring: computer→ putSubsequence: computer→ opt
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 32 / 309
![Page 33: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/33.jpg)
Operations on languages
Union L ∪M = {x|x ∈ L∨x ∈M}Concatenation LM = {xy|x ∈ L∧ y ∈M}Kleen Closure L∗ =
⋃∞i=0 L
i
Positive Closure L+ =⋃∞i=1 L
i
Example: L = {A, . . . , Z},M = {0, 1, . . . , 9}L ∪M Set of letters and numbersLM Strings containing a letter followed by a numberL4 Strings which are made up of four lettersL∗ Strings made up of letters including εM+ Number strings containing at least one number
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 33 / 309
![Page 34: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/34.jpg)
Regular expressions
Example: Pascal identifier letter ( letter | digit )∗
Regular expression r defines a language over an alphabet Σusing the following rules:
1 The regular expression ε describes {ε}2 a ∈ Σ: The regular Expression a describes {a}3 r, s are regular expressions (languages L(r), L(s)):
1 (r)|(s) describes L(r) ∪ L(s)2 (r)(s) describes L(r)L(s)3 (r)∗ describes (L(r))∗
4 (r)+ describes (L(r))+
Regular sets
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 34 / 309
![Page 35: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/35.jpg)
Examples
Assuming Σ = {a, b}1 a|b describes the language {a, b}2 (a|b)(a|b) describes {aa, ab, ba, bb}3 a∗ describes {ε, a, aa, aaa, aaaa, . . .}4 (a|b)∗ describes all strings which contain no or multiple a’s and b’s
Not all languages can be described by regular expressionse.g. {wcw|w is a string made up of a’s and b’s}Regular expressions may only describe words containing either afixed amount of repetitions or an infinite amount of repetitions
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 35 / 309
![Page 36: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/36.jpg)
Regular definitions
A regular definition is a sequence of the form:
d1 → r1. . .dn → rn
whereby di is a distinct name and ri is a regular expression overΣ ∪ {d1, . . . , dn}Example:
letter → A| . . . |Z|a| . . . |zdigit → 0|1| . . . |9
id → letter(letter|digit)∗
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 36 / 309
![Page 37: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/37.jpg)
Token detection
Goal: Identify lexeme of token in input buffer
Output: Token-attribute-pair
Regular Expression Token Attribute Valuewhite space - -
if if -then then -
id id pointer to table entrynum num pointer to table entry< relop LT> relop GT. . . . . . . . .
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 37 / 309
![Page 38: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/38.jpg)
Transition diagrams
Actions performed by lexical analyser (after get next token ofthe parser)States connected by directed edgesEdges are labeled:
Labels contain symbols expected in input buffer in order to toprogress to next stateLabel other denotes all symbols not used by any other edgeoriginating from the same node
One start stateEnd states after matching a tokenDeterministic (not necessarily)
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 38 / 309
![Page 39: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/39.jpg)
Example
Regular expression letter (letter | digit)∗ can be described byfollowing transition diagram:
start letter
letter or digit
otherreturn(gettoken(),install_id())1 2 3
Assume the input buffer contains:Pos 1 2 3 4 . . .
P I = . . .and the current-symbol pointer is pointing to position 1
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 39 / 309
![Page 40: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/40.jpg)
Tokenmatching - Example
1 Current symbol = P, position = 1⇒ Transition from state 1 to state 2 and read new symbol
2 Current symbol = I, position = 2⇒ Remain in state 2 and read new symbol
3 Current symbol = SPACE, position = 3⇒ Transition from state 2 to state 3
4 State 3 is an end state
start letter
letter or digit
otherreturn(gettoken(),install_id())1 2 3
P
I
’ ’
1
2
3
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 40 / 309
![Page 41: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/41.jpg)
Example – Implementing a Lexer directly
Pseudo code1 lexem = ““;2 if (next char ∈ letter) then3 lexem = lexem + next char;4 get next char();5 while (next char ∈ letter ∪ digit) do6 lexem = lexem + next char;7 get next char();8 return IDENTIFIER(lexem);
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 41 / 309
![Page 42: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/42.jpg)
Generalization – Finite automata
Nondeterministic finite automaton (NFA) (S,Σ,move, s0, F )1 Set of states S2 Set of input symbols Σ3 Transition relation move : S × (Σ ∪ {ε}) 7→ 2S , relating
state-symbol-pairs to states4 Initial state s0 ∈ S5 Set of end states F ⊆ S
NFA can be visualized in form of directed graph (transition graph)Distinctions from transition diagram:
Nondeterministicε-moves allowed
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 42 / 309
![Page 43: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/43.jpg)
NFA – Example
NFA for aa∗|bb∗ (language, accepting NFA):({0, 1, 2, 3, 4}, {a, b},move, 0, {2, 4}) withmove:
state symbol new state0 ε {1,3}1 a {2}2 a {2}3 b {4}4 b {4}
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 43 / 309
![Page 44: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/44.jpg)
Deterministic finite automaton
Deterministic finite automaton (DFA) is a special case of a NFAwhereby:
1 No state has an ε-move2 For each state s there is at most one edge which is
marked with an input symbol a
DFAs contain exactly one transition for each inputEach entry in transition table contains exactly one state
State Input Symbolsa b
0 1 2
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 44 / 309
![Page 45: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/45.jpg)
DFA simulation
Input: Input string x, DFA DOutput: ’Yes’ if D accepts x, ’No’ otherwise
s := s0c := nextcharwhile c 6= eof do
s := move(s, c)if s ==⊥ then
return ’No’c := nextchar
endif s ∈ F then
return ’Yes’else
return ’No’end
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 45 / 309
![Page 46: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/46.jpg)
DFA – Example
DFA for aa∗|bb∗:({0, 1, 2}, {a, b},move, 0, {1, 2}) whereby
move:
state symbol new state0 a 10 b 21 a 12 b 2
0 b
a
2
1
a
b
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 46 / 309
![Page 47: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/47.jpg)
Conversion NFA→ DFA
Subset construction algorithmIdea:
1 Each DFA state corresponds to a set of NFA states2 After reading a1a2 . . . an the DFA is in a state which ist equivalent to
a subset of the NFA. This subset corresponds to the path of theNFA when reading a1a2 . . . an.
Number of DFA states is exponential in the nunber of NFA states(worst case)
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 47 / 309
![Page 48: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/48.jpg)
Subset construction
Input: NFA N = (S,Σ,move, s0, F )Output: DFA D, accepting the same language as N
Initially, ε-closure(s0) is the only state in Dstates and it is unmarkedwhile there is an unmarked state T in Dstates do
mark Tfor each input symbol a do
U := ε-closure(move(T ,a))if U 6∈ Dstates then
Add U as unmarked state to DstatesendDtran[T, a] := U
endend
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 48 / 309
![Page 49: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/49.jpg)
ε-closure, move
ε-closure(s): Set of NFA states which are accessible from a state svia ε-movesε-closure(T ): Set of NFA states which are accessible from a states ∈ T via ε-movesmove(T , a): Set of NFA states which are accessible from a states ∈ T via a move relation of the input symbol a
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 49 / 309
![Page 50: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/50.jpg)
Calculation of ε-closure
Input: NFA N , set of states TOutput: ε-closure
Push all states in T onto stackInitialize ε-closure(T ) to Twhile stack 6= ∅ do
Pop t, the top element of stackfor each state u with an edge from t to u labeled ε do
if u 6∈ ε-closure(T ) thenAdd u to ε-closure(T )Push u onto stack
endend
end
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 50 / 309
![Page 51: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/51.jpg)
Example NFA→ DFA
0 1
2 3
4 5
6 7 8 9 10
ε
ε
ε
ε
ε
ε
ε
ε
a
b
a b b
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 51 / 309
![Page 52: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/52.jpg)
Thompson’s ConstructionInput:
Regular expression r over an alphabet ΣOutput: NFA N , which accepts L(r)
ε ε
i f st fi
N(s)
N(t)
a ∈ Σ i fa s∗ i
N(s)
f
ε
ε
ε
ε
s|t i f
N(s)
N(t)
ε
ε
ε
ε
[s]
whereby [s] = (s | ε)
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 52 / 309
![Page 53: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/53.jpg)
NFA simulation
Input: NFA N , input string xOutput: ’Yes’ if N accepts x, ’No’ otherwise
S := ε-closure({s0})a := nextCharwhile a 6= eof do
S := ε-closure(move(S,a))a := nextChar
endif S ∩ F 6= ∅ then
return ’Yes’else
return ’No’end
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 53 / 309
![Page 54: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/54.jpg)
Summary
Check whether an input string x is contained in a languagedefined by the regular expression r:
1 Construct NFA (Thompson’s construction) and apply simulationalgorithm for NFAs
2 Construct NFA (Thompson’s construction), convert NFA to DFA andapply simulation algorithm for DFAs
Time-space tradeoffs
Automaton Space TimeNFA O(|r|) O(|r| · |x|)DFA O(2|r|) O(|x|)
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 54 / 309
![Page 55: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/55.jpg)
Regular expression→ DFA
Direct conversionNotation:
NFA state s is important↔ s has at least one non-ε-move to asuccessorExtended regular expression (Augmented regular expression) (r)#Illustration of extended expressions via syntax trees (cat-node,or-node, star-node)Each leaf node is labeled either with a symbol (of the alphabet) orwith εEach leaf node (6= ε) has a designated number (position)
Remark: NFA-DFA-conversion only via important states =positions
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 55 / 309
![Page 56: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/56.jpg)
Syntax tree – Example
*
|
a b
b
#
b
a
1 2
3
4
5
6
Syntax tree for (a|b)∗abb#
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 56 / 309
![Page 57: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/57.jpg)
Algorithm - Idea
Approach:1 Create syntax tree for extended regular expression2 Calculate four functions: nullable, firstpos, lastpos, followpos3 Construct DFA using followpos
DFA states correspond to sets of positions (important NFA states)Position i, followpos(i) are positions j for which there exists aninput string . . . cd . . . where i corresponds to c and j correspondsto d
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 57 / 309
![Page 58: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/58.jpg)
followpos – Example
Syntax tree for (a|b)∗abb#followpos(1) = {1, 2, 3}Explanation: Suppose we see an a. This symbol could belongeither to (a|b)∗ or to the following a. If we see a b then this symbolhas to belong to (a|b)∗. Thus positions 1,2,3 are contained infollowpos(1).Informal definition of functions (node n, string s)
firstpos . . . positions, which match the first symbol of slastpos . . . positions, which match the last symbol of snullable . . . True, if node n can create a language containing theempty string, false otherwise
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 58 / 309
![Page 59: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/59.jpg)
Rules for nullable, firstpos, lastpos
Node n nullable firstpos [lastpos]Leaf n labeledε
true ∅ [∅]Leaf n labeledposition i false {i} [{i}]
c1|c2 nullable(c1)∨nullable(c2) firstpos(c1) ∪ firstpos(c2)[lastpos(c1) ∪ lastpos(c2)]
c1 • c2 nullable(c1)∧nullable(c2)
if nullable(c1) thenfirstpos(c1) ∪firstpos(c2)else firstpos(c1)if nullable(c2) thenlastpos(c1) ∪ lastpos(c2)else lastpos(c2)
c∗1 true firstpos(c1) [lastpos(c1)]
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 59 / 309
![Page 60: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/60.jpg)
firstpos, followpos – Example
*
|
a b
b
#
b
a
1 2
3
4
5
6
{1} {2}
{3}
{4}
{5}
{6}
{1,2,3}
{1,2}
{1} {2}
{3}
{4}
{5}
{6}
{6}
{5}
{4}
{3}
{1,2,3}
{1,2,3}
{1,2,3}
{1,2}
{1,2} {1,2}
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 60 / 309
![Page 61: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/61.jpg)
Calculation followpos
1 If n is a cat-node (•), c1 is the left, c2 is the right child and i is aposition in lastpos(c1), then all positions of firstpos(c2) arecontained in followpos(i)
2 If n is a star-node (∗) and i is contained in lastpos(n), then allpositions of firstpos(c) are contained in followpos(i)
Node followpos1 {1,2,3}2 {1,2,3}3 {4}4 {5}5 {6}6 -
1
2
3 4 5 6
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 61 / 309
![Page 62: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/62.jpg)
followpos-graph→ NFA
A NFA without ε-moves can be created based on afollowpos-graph:
1 Mark all positions in firstpos of the root node of the syntax tree asinitial states
2 Label each directed edge (i, j) with the symbol of position i3 Mark position which belongs to # as end state
⇒ followpos-graph can be converted to DFA (using subsetconstruction)
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 62 / 309
![Page 63: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree](https://reader031.vdocuments.us/reader031/viewer/2022022809/5e812dd06bc1ba56106df111/html5/thumbnails/63.jpg)
DFA-algorithmInput:
Regular expression rOutput: DFA D, accepting the language L(r)
Initially, the only unmarked state in Dstates is firstpos(root)while there is an unmarked state T ∈ Dstates do
Mark Tfor each input symbol a do
Let U be the set of positions that are in followpos(p)for some position p ∈ T where the symbol of p is a.
if U 6= ∅ and U 6∈ Dstates thenAdd U as unmarked state to Dstates
endDTran(T, a) = U
endend
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 63 / 309