![Page 1: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/1.jpg)
INF5110 – Compiler Construction
Spring 2016
1 / 672
![Page 2: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/2.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
2 / 672
![Page 3: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/3.jpg)
INF5110 – Compiler Construction
Introduction
Spring 2016
3 / 672
![Page 4: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/4.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
4 / 672
![Page 5: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/5.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
5 / 672
![Page 6: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/6.jpg)
Course info
Course presenters:• Martin Steffen ([email protected])• Stein Krogdahl ([email protected])• Birger Møller-Pedersen ([email protected])• Eyvind Wærstad Axelsen (oblig-ansvarlig,[email protected])
Course’s web-pagehttp://www.uio.no/studier/emner/matnat/ifi/INF5110
• overview over the course, pensum (watch for updates)• various announcements, beskjeder, etc.
6 / 672
![Page 7: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/7.jpg)
Course material and plan
• The material is based largely on [?], but also other sources willplay a role. A classic is “the dragon book” [?]
• see also Errata list athttp://www.cs.sjsu.edu/~louden/cmptext/
• approx. 3 hours teaching per week• mandatory assignments (= “obligs”)
• O1 published mid-February, deadline mid-March• O2 published beginning of April, deadline beginning of May
• group work up-to 3 people recommended. Please inform usabout such planned group collaboration
• slides: see updates on the net• exam: 8th June, 14:30, 4 hours.
7 / 672
![Page 8: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/8.jpg)
Motivation: What is CC good for?
• not everyone is actually building a full-blown compiler, but• fundamental concepts and techniques in CC• most, if not basically all, software reads, processes/transforms
and outputs “data”⇒ often involves techniques central to CC• Understanding compilers ⇒ deeper understanding of
programming language(s)• new language (domain specific, graphical, new language
paradigms and constructs. . . )⇒ CC & their principles will never be “out-of-fashion”.
8 / 672
![Page 9: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/9.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
9 / 672
![Page 10: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/10.jpg)
Architecture of a typical compiler
Figure: Structure of a typical compiler
10 / 672
![Page 11: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/11.jpg)
Anatomy of a compiler
11 / 672
![Page 12: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/12.jpg)
Pre-processor
• either separate program or integrated into compiler• nowadays: C-style preprocessing mostly seen as “hack” graftedon top of a compiler.1
• examples (see next slide):• file inclusion2
• macro definition and expansion3
• conditional code/compilation: Note: #if is not the same asthe if-programming-language construct.
• problem: often messes up the line numbers
1C-preprocessing is still considered sometimes a useful hack, otherwise itwould not be around . . . But it does not naturally encourage elegant andwell-structured code, just quick fixes for some situations.
2the single most primitive way of “composing” programs split into separatepieces into one program.
3Compare also to the \newcommand-mechanism in LATEX or the analogous\def-command in the more primitive TEX-language.
12 / 672
![Page 13: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/13.jpg)
C-style preprocessor examples
#inc lude <f i l ename>
Listing 1: file inclusion
#v a r d e f #a = 5 ; #c = #a+1. . .#i f (#a < #b)
. .#e l s e
. . .#end i f
Listing 2: Conditional compilation
13 / 672
![Page 14: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/14.jpg)
C-style preprocessor: macros
#macrodef hentdata (#1,#2)−−− #1−−−−#2−−−(#1)−−−
#endde f
. . .#hentdata ( k a r i , pe r )
Listing 3: Macros
−−− ka r i −−−−per −−−( k a r i )−−−
14 / 672
![Page 15: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/15.jpg)
Scanner (lexer . . . )
• input: “the program text” ( = string, char stream, or similar)• task
• divide and classify into tokens, and• remove blanks, newlines, comments ..
• theory: finite state automata, regular languages
15 / 672
![Page 16: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/16.jpg)
Scanner: illustration
a [ i nd e x ] ␣=␣4␣+␣2
lexeme token class valuea identifier "a"[ left bracketindex identifier "index"] right bracket= assignment4 number "4"+ plus sign2 number "2"
16 / 672
![Page 17: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/17.jpg)
Scanner: illustration
a [ i nd e x ] ␣=␣4␣+␣2
lexeme token class valuea identifier 2[ left bracketindex identifier 21] right bracket= assignment4 number 4+ plus sign2 number 2
012 "a"
⋮
21 "index"22
⋮
17 / 672
![Page 18: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/18.jpg)
Parser
18 / 672
![Page 19: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/19.jpg)
a[index] = 4 + 2: parse tree/syntax tree
expr
assign-expr
expr
subscript expr
expr
identifiera
[ expr
identifierindex
]
= expr
additive expr
expr
number4
+ expr
number2
19 / 672
![Page 20: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/20.jpg)
a[index] = 4 + 2: abstract syntax tree
assign-expr
subscript expr
identifiera
identifierindex
additive expr
number2
number4
20 / 672
![Page 21: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/21.jpg)
(One typical) Result of semantic analysis
• one standard, general outcome of semantic analysis:“annotated” or “decorated” AST
• additional info (non context-free):• bindings for declarations• (static) type information
assign-expr
additive-expr
number
2
number
4
subscript-expr
identifier
index
identifier
a :array of int :int
:array of int :int
:int :int
:int :int
:int :int
: ?
• here: identifiers looked up wrt. declaration• 4, 2: due to their form, basic types.
21 / 672
![Page 22: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/22.jpg)
Optimization at source-code level
assign-expr
subscript expr
identifiera
identifierindex
number6
t = 4+2;a[index] = t;
t = 6;a[index] = t;
a[index] = 6;
22 / 672
![Page 23: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/23.jpg)
Code generation & optimization
MOV R0 , i nd ex ; ; v a l u e o f i nd e x −> R0MUL R0 , 2 ; ; doub l e v a l u e o f R0MOV R1 , &a ; ; a dd r e s s o f a −> R1ADD R1 , R0 ; ; add R0 to R1MOV ∗R1 , 6 ; ; c on s t 6 −> add r e s s i n R1
MOV R0 , i nd ex ; ; v a l u e o f i nd e x −> R0SHL R0 ; ; doub l e v a l u e i n R0MOV &a [ R0 ] , 6 ; ; con s t 6 −> add r e s s a+R0
• many optimizations possible• potentially difficult to automatize4, based on a formaldescription of language and machine
• platform dependent
4not that one has much of a choice. Difficult or not, no one wants tooptimize generated machine code by hand . . . .
23 / 672
![Page 24: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/24.jpg)
Anatomy of a compiler (2)
24 / 672
![Page 25: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/25.jpg)
Misc. notions
• front-end vs. back-end, analysis vs. synthesis• separate compilation• how to handle errors?• “data” handling and management at run-time (static, stack,heap), garbage collection?
• language can be compiled in one pass?• E.g. C and Pascal: declarations must precede use• no longer too crucial, enough memory available
• compiler assisting tool and infra structure, e.g.• debuggers• profiling• project management, editors• build support• . . .
25 / 672
![Page 26: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/26.jpg)
Compiler vs. interpeter
Compilation• classically: source code ⇒ machine code for given machine• different “forms” of machine code (for 1 machine):
• executable ⇔ relocatable ⇔ textual assembler code
full interpretation• directly executed from program code/syntax tree• often used for command languages, interacting with OS etc.• speed typically 10–100 slower than compilation
compilation to intermediate code which is interpreted• used in e.g. Java, Smalltalk, . . . .• intermediate code: designed for efficient execution (byte codein Java)
• executed on a simple interpreter (JVM in Java)• typically 3–30 times slower than direct compilation• in Java: byte-code ⇒ machine code in a just-in time manner(JIT)
26 / 672
![Page 27: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/27.jpg)
More recent compiler technologies
• Memory has become cheap (thus comparatively large)• keep whole program in main memory, while compiling
• OO has become rather popular• special challenges & optimizations
• Java• “compiler” generates byte code• part of the program can be dynamically loaded during run-time
• concurrency, multi-core• graphical languages (UML, etc), “meta-models” besidesgrammars
27 / 672
![Page 28: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/28.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
28 / 672
![Page 29: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/29.jpg)
Compiling from source to target on host
“tombstone diagrams” (or T-diagrams). . . .
29 / 672
![Page 30: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/30.jpg)
Two ways to compose “T-diagrams”
30 / 672
![Page 31: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/31.jpg)
Using an “old” language and its compiler for write acompiler for a “new” one
31 / 672
![Page 32: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/32.jpg)
Pulling oneself up on one’s own bootstraps
bootstrap (verb, trans.): to promote or develop . . . withlittle or no assistance— Merriam-Webster
32 / 672
![Page 33: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/33.jpg)
Bootstrapping 2
33 / 672
![Page 34: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/34.jpg)
Porting & cross compilation
34 / 672
![Page 35: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/35.jpg)
INF5110 – Compiler Construction
Scanning
Spring 2016
35 / 672
![Page 36: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/36.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
36 / 672
![Page 37: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/37.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
37 / 672
![Page 38: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/38.jpg)
Scanner section overview
what’s a scanner?• Input: source code.a
• Output: sequential stream of tokensaThe argument of a scanner is often a file name or an input stream or similar.
• regular expressions to describe various token classes• (deterministic/nondeterminstic) finite-state automata (FSA,DFA, NFA)
• implementation of FSA• regular expressions → NFA• NFA ↔ DFA
38 / 672
![Page 39: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/39.jpg)
What’s a scanner?
• other names: lexical scanner, lexer, tokenizer
A scanner’s functionalityPart of a compiler that takes the source code as input andtranslates this stream of characters into a stream of tokens.
• char’s typically language independent.5
• tokens already language-specific.6
• works always “left-to-right”, producing one single token afterthe other, as it scans the input7
• it “segments” char stream into “chunks” while at the sametime “classifying” those pieces ⇒ tokens
5Characters are language-independent, but perhaps the encoding may vary,like ASCII, UTF-8, also Windows-vs.-Unix-vs.-Mac newlines etc.
6There are large commonalities across many languages, though.7No theoretical necessity, but that’s how also humans consume or “scan” a
source-code text. At least those humans trained in e.g. Western languages.39 / 672
![Page 40: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/40.jpg)
Typical responsibilities of a scanner
• segment & classify char stream into tokens• typically described by “rules” (and regular expressions)• typical language aspects covered by the scanner
• describing reserved words or key words• describing format of identifiers (= “strings” representing
variables, classes . . . )• comments (for instance, between // and NEWLINE)• white space
• to segment into tokens, a scanner typically “jumps over” whitespaces and afterwards starts to determine a new token
• not only “blank” character, also TAB, NEWLINE, etc.
• lexical rules: often (explicit or implicit) priorities• identifier or keyword? ⇒ keyword• take the longest possible scan that yields a valid token.
40 / 672
![Page 41: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/41.jpg)
“Scanner = Regular expressions (+ priorities)”
Rule of thumbEverything about the source code which is so simple that it can becaptured by reg. expressions belongs into the scanner.
41 / 672
![Page 42: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/42.jpg)
How does scanning roughly work?
. . . a [ i n d e x ] = 4 + 2 . . .
q0q1
q2
q3 ⋱
qn
Finite control
q2
Reading “head”(moves left-to-right)
a[index] = 4 + 2
42 / 672
![Page 43: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/43.jpg)
How does scanning roughly work?
. . . a [ i n d e x ] = 4 + 2 . . .
q0q1
q2
q3 ⋱
qn
Finite control
q0
Reading “head”(moves left-to-right)
a[index] = 4 + 2
43 / 672
![Page 44: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/44.jpg)
How does scanning roughly work?
. . . a [ i n d e x ] = 4 + 2 . . .
q0q1
q2
q3 ⋱
qn
Finite control
q1
Reading “head”(moves left-to-right)
a[index] = 4 + 2
44 / 672
![Page 45: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/45.jpg)
How does scanning roughly work?
• usual invariant in such pictures (by convention): arrow or headpoints to the first character to be read next (and thus afterthe last character having been scanned/read last)
• in the scanner program or procedure:• analogous invariant, the arrow corresponds to a specific
variable• contains/points to the next character to be read• name of the variable depends on the scanner/scanner tool
• the head in the pic: for illustration, the scanner does not reallyhave a “reading head”
• remembrance of Turing machines, or• the old times when perhaps the program data was stored on a
tape.8
8Very deep down, if one still has a magnetic disk (as opposed to SSD) thesecondary storage still has “magnetic heads”, only that one typically does notparse directly char by char from disk. . .
45 / 672
![Page 46: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/46.jpg)
The bad old times: Fortran
• in the days of the pioneers• main memory was smaaaaaaaaaall• compiler technology was not well-developed (or not at all)• programming was for very few “experts”.9
• Fortran was considered a very high-level language (wow, alanguage so complex that you had to compile it . . . )
9There was no computer science as profession or university curriculum.46 / 672
![Page 47: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/47.jpg)
(Slightly weird) lexical ascpects of Fortran
Lexical aspects = those dealt with a scanner
• whitespace without “meaning”:
I F( X 2. EQ. 0) TH E N vs. IF ( X2. EQ.0 ) THEN
• no reserved words!
IF (IF.EQ.0) THEN THEN=1.0
• general obscurity tolerated:DO99I=1,10 vs. DO99I=1.10
DO 99 I =1 ,10−
−
99 CONTINUE
47 / 672
![Page 48: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/48.jpg)
Fortran scanning: remarks
• Fortran (of course) has evolved from the pioneer days . . .• no keywords: nowadays mostly seen as bad idea10
• treatment of white-space as in Fortran: not done anymore:THEN and TH EN are different things in all languages
• however:11 both considered “the same”:
i f ␣b␣ then ␣ . . i f ␣␣␣b␣␣␣␣ then ␣ . .
• since concepts/tools (and much memory) were missing,Fortran scanner and parser (and compiler) were
• quite simplistic• syntax: designed to “help” the lexer (and other phases)
10It’s mostly a question of language pragmatics. The lexers/parsers wouldhave no problems using while as variable, but humans tend to have.
11Sometimes, the part of a lexer / parser which removes whitespace (andcomments) is considered as separate and then called screener. Not verycommon though.
48 / 672
![Page 49: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/49.jpg)
A scanner classifies
• “good” classification: depends also on later phases, may not beclear till later
Rule of thumbThings being treated equal in the syntactic analysis (= parser, i.e.,subsequent phase) should be put into the same category.
• terminology not 100% uniform, but most would agree:
Lexemes and tokensLexemes are the “chunks” (pieces) the scanner produces fromsegmenting the input source code (and typically droppingwhitespace). Tokens are the result of /classifying those lexemes.
• token = token name × token value
49 / 672
![Page 50: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/50.jpg)
A scanner classifies & does a bit more
• token data structure in OO settings• token themselves defined by classes (i.e., as instance of a class
representing a specific token)• token values: as attribute (instance variable) in its values
• often: scanner does slightly more than just classification• store names in some table and store a corresponding index as
attribute• store text constants in some table, and store corresponding
index as attribute• even: calculate numeric constants and store value as attribute
50 / 672
![Page 51: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/51.jpg)
One possible classification
name/identifier abc123integer constant 42real number constant 3.14E3text constant, string literal "this is a text constant"arithmetic op’s + - * /boolean/logical op’s and or not (alternatively /\ \/ )relational symbols <= < >= > = == !=
all other tokens: { } ( ) [ ] , ; := . etc.every one it its own group
• this classification: not the only possible (and not necessarilycomplete)
• note: overlap:• "." is here a token, but also part of real number constant• "<" is part of "<="
51 / 672
![Page 52: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/52.jpg)
One way to represent tokens in C
typedef s t r u c t {TokenType t o k en v a l ;char ∗ s t r i n g v a l ;i n t numval ;
} TokenRecord ;
If one only wants to store one attribute:
typedef s t r u c t {Tokentype t o k en v a l ;union{ char ∗ s t r i n g v a l ;
i n t numval} a t t r i b u t e ;
} TokenRecord ;
52 / 672
![Page 53: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/53.jpg)
How to define lexical analysis and implement a scanner?
• even for complex languages: lexical analysis (in principle) nothard to do
• “manual” implementation straightforwardly possible• specification (e.g., of different token classes) may be given in“prosa”
• however: there are straightforward formalisms and efficient,rock-solid tools available:
• easier to specify unambigously• easier to communicate the lexical definitions to others• easier to change and maintain
• often called parser generators typically not just generate ascanner, but code for the next phase (parser), as well.
53 / 672
![Page 54: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/54.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
54 / 672
![Page 55: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/55.jpg)
General concept: How to generate a scanner?
1. regular expressions to describe language’s lexical aspects• like whitespaces, comments, keywords, format of identifiers etc.• often: more “user friendly” variants of reg-expr are supported
to specify that phase2. classify the lexemes to tokens3. translate the reg-expressions ⇒ NFA.4. turn the NFA into a deterministic FSA (= DFA)5. the DFA can straightforwardly be implementated
• Above steps are done automatically by a “lexer generator”• lexer generators help also in other user-friendly ways ofspecifying the lexer: defining priorities, assuring that thelongest possible token is given back, repeat the processs togenerate a sequence of tokens12
• Step 2 is actually not covered by the classical Reg-expr = DFA= NFA results, it’s something extra.
12Maybe even prepare useful error messages if scanning (not scannergeneration) fails.
55 / 672
![Page 56: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/56.jpg)
Use of regular expressions
• regular languages: fundamental class of “languages”• regular expressions: standard way to describe regular languages• origin of regular expressions: one starting point is Kleene [?]but there had been earlier works outside “computer science”
• Not just used in compilers• often used for flexible “ searching ”: simple form of patternmatching
• e.g. input to search engine interfaces• also supported by many editors and text processing or scriptinglanguages (starting from classical ones like awk or sed)
• but also tools like grep or find
find . -name "*.tex"
• often extended regular expressions, for user-friendliness, nottheoretical expressiveness.
56 / 672
![Page 57: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/57.jpg)
Alphabets and languages
Definition (Alphabet Σ)Finite set of elements called “letters” or “symbols” or “characters”
Definition (Words and languages over Σ)Given alphabet Σ, a word over Σ is a finite sequence of letters fromΣ. A language over alphabet Σ is a set of finite words over Σ.
• in this lecture: we avoid terminology “symbols” for now, aslater we deal with e.g. symbol tables, where symbols meanssomething slighly different (at least: at a different level).
• Sometimes Σ left “implicit” (as assumed to be understoodfrom the context)
• practical examples of alphabets: ASCII, Norwegian letters(capital and non-capitals) etc.
57 / 672
![Page 58: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/58.jpg)
Languages
• note: Σ is finite, and words are of finite length• languages: in general infinite sets of words• Simple examples: Assume Σ = {a,b}• words as finite “sequences” of letters
• ε: the empty word (= empty sequence)• ab means “ first a then b ”
• sample languages over Σ are1. {} (also written as ∅) the empty set2. {a,b, ab}: language with 3 finite words3. {ε} (/= ∅)4. {ε, a, aa, aaa, . . .}: infinite languages, all words using only a ’s.5. {ε, a, ab, aba, abab, . . .}: alternating a’s and b’s6. {ab,bbab, aaaaa,bbabbabab, aabb, . . .}: ?????
58 / 672
![Page 59: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/59.jpg)
How to describe languages
• language mostly here in the abstract sense just defined.• the “dot-dot-dot” (. . .) is not a good way to describe to acomputer (and many humans) what is meant
• enumerating explicitly all allowed words for an infinitelanguage does not work either
NeededA finite way of describing infinite languages (which is hopefullyefficiently implementable & easily readable)
BewareIs it apriori clear to expect that all infinite languages can even becaptured in a finite manner?
• small metaphor
2.727272727 . . . 3.1415926 . . . (1)
59 / 672
![Page 60: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/60.jpg)
Regular expressions
Definition (Regular expressions)
A regular expression is one of the following1. a basic regular expression of the form a (with a ∈ Σ), or ε, or ∅2. an expression of the form r ∣ s, where r and s are regular
expressions.3. an expression of the form r s, where r and s are regular
expressions.4. an expression of the form r∗, where r is a regular expression.5. an expression of the form (r), where r is a regular expression.
Precedence (from high to low): ∗, concatenation, ∣
60 / 672
![Page 61: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/61.jpg)
A concise definition
later introduced as (notation for) context-free grammars:
r → ar → εr → ∅
r → r ∣ rr → r rr → r∗
r → (r)
(2)
61 / 672
![Page 62: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/62.jpg)
Same again
Notational conventionsLater, for CF grammars, we use capital letters to denote “variables”of the grammars (then called non-terminals). If we like to beconsistent with that convention, the definition looks as follows:
R → aR → εR → ∅
R → R ∣ RR → R RR → R∗
R → (R)
(3)
62 / 672
![Page 63: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/63.jpg)
Symbols, meta-symbols, meta-meta-symbols . . .
• regexps: notation or “language” to describe “languages” over agiven alphabet Σ (i.e. subsets of Σ∗)
• language being described ⇔ language used to describe thelanguage
⇒ language ⇔ meta-language• here:
• regular expressions: notation to describe regular languages• English resp. context-free notation:13 notation to describe
regular expression
• for now: carefully use notational convention for precision
13To be careful, we will (later) distinguish between context-free languages onthe one hand and notations to denote context-free languages on the other, inthe same manner that we now don’t want to confuse regular languages asconcept from particular notations (specifically, regular expressions) to writethem down.
63 / 672
![Page 64: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/64.jpg)
Notational conventions
• notational conventions by typographic means (i.e., differentfonts etc.)
• not easy discscernible, but: difference between• a and a• ε and ε• ∅ and ∅• ∣ and ∣ (especially hard to see :-))• . . .
• later (when gotten used to it) we may take a more “relaxed”attitude toward it, assuming things are clear, as do manytextbooks
• Note: in compiler implementations, the distinction betweenlanguage and meta-language etc. is very real (even if not doneby typographic means . . . )
64 / 672
![Page 65: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/65.jpg)
Same again once more
R → a ∣ ε ∣ ∅ basic reg. expr.∣ R ∣ R ∣ R R ∣ R∗ ∣ (R) compound reg. expr.
(4)
Note:• symbol ∣: as symbol of regular expressions• symbol ∣ : meta-symbol of the CF grammar notation• The meta-notation use here for regular expressions will be thesubject of later chapters
65 / 672
![Page 66: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/66.jpg)
Semantics (meaning) of regular expressions
Definition (Regular expression)
Given an alphabet Σ. The meaning of a regexp r (written L(r))over Σ is given by equation (5).
L(∅) = {} empty languageL(ε) = ε empty wordL(a) = {a} single “letter” from Σ
L(r ∣ s) = L(r) ∪L(s) alternativeL(r∗) = L(r)∗ iteration
(5)
• conventional precedences: ∗, concatenation, ∣.• Note: left of “=”: reg-expr syntax, right of “=”:semantics/meaning/math 14
14Sometimes confusingly “the same” notation.66 / 672
![Page 67: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/67.jpg)
Examples
In the following:• Σ = {a,b, c}.• we don’t bother to “boldface” the syntax
words with exactly one b (a ∣ c)∗b(a ∣ c)∗words with max. one b ((a ∣ c)∗) ∣ ((a ∣ c)∗b(a ∣ c)∗)
(a ∣ c)∗ (b ∣ ε) (a ∣ c)∗words of the form anban,i.e., equal number of a’sbefore and after 1 b
67 / 672
![Page 68: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/68.jpg)
Another regexpr example
words that do not contain two b’s in a row.(b (a ∣ c))∗ not quite there yet((a ∣ c)∗ ∣ (b (a ∣ c))∗)∗ better, but still not there
= (simplify)((a ∣ c) ∣ (b (a ∣ c)))∗ = (simplifiy even more)(a ∣ c ∣ ba ∣ bc)∗(a ∣ c ∣ ba ∣ bc)∗ (b ∣ ε) potential b at the end(notb ∣ notb b)∗(b ∣ ε) where notb ≜ a ∣ c
68 / 672
![Page 69: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/69.jpg)
Additional “user-friendly” notations
r+ = rr∗
r? = r ∣ ε
Special notations for sets of letters:
[0 − 9] range (for ordered alphabets)~a not a (everything except a). all of Σ
naming regular expressions (“regular definitions”)
digit = [0 − 9]nat = digit+
signedNat = (+∣−)natnumber = signedNat(”.”nat)?(E signedNat)?
69 / 672
![Page 70: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/70.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
70 / 672
![Page 71: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/71.jpg)
Finite-state automata
• simple “computational” machine• (variations of) FSA’s exist in many flavors and under differentnames
• other rather well-known names include finite-state machines,finite labelled transition systems,
• “state-and-transition” representations of programs or behaviors(finite state or else) are wide-spread as well
• state diagrams• Kripke-structures• I/O automata• Moore & Mealy machines
• the logical behavior of certain classes of electronic circuitrywith internal memory (“flip-flops”) is described by finite-stateautomata.15
15Historically, design of electronic circuitry (not yet chip-based, though) wasone of the early very important applications of finite-state machines.
71 / 672
![Page 72: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/72.jpg)
FSA
Definition (FSA)
A FSA A over an alphabet Σ is a tuple (Σ,Q, I ,F , δ)• Q: finite set of states• I ⊆ Q, F ⊆ Q: initial and final states.• δ ⊆ Q ×Σ ×Q transition relation
• final states: also called accepting states• transition relation: can equivalently be seen as functionδof Q ×Σ→ 2Q : for each state and for each letter, give backthe set of sucessor states (which may be empty)
• more suggestive notation: q1aÐ→ q2 for (q1, a,q2) ∈ δ
• We also use freely —self-evident, we hope— things like
q1aÐ→ q2
bÐ→ q3
72 / 672
![Page 73: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/73.jpg)
FSA as scanning machine?
• FSA have slightly unpleasant properties when considering themas decribing an actual program (i.e., a scannerprocedure/lexer)
• given the “theoretical definition” of acceptance:
Mental picture of a scanning automatonThe automaton eats one character after the other, and, whenreading a letter, it moves to a successor state, if any, of the currentstate, depending on the character at hand.
• 2 problematic aspects of FSA• non-determinism: what if there is more than one possible
successor state?• undefinedness: what happens if there’s no next state for a
given input• the second one is easily repaired, the first one requires morethought
73 / 672
![Page 74: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/74.jpg)
DFA: deterministic automata
Definition (DFA)
A deterministic, finite automaton A (DFA for short) over analphabet Σ is a tuple (Σ,Q, I ,F , δ)
• Q: finite set of states• I = {i} ⊆ Q, F ⊆ Q: initial and final states.• δof Q ×Σ→ Q transition function
• transition function: special case of transition relation:• deterministic• left-total16
16That means, for each pair q, a from Q ×Σ, δ(q, a) is defined. Some peoplecall an automaton where δ is not a left-total but a determinstic relation (or,equivalently, the function δ is not total, but partial) still a deterministicautomaton. In that terminology, the DFA as defined here would bedeterminstic and total.
74 / 672
![Page 75: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/75.jpg)
Meaning of an FSA
SemanticsThe intended meaning of an FSA over an alphabet Σ is the setconsisting of all the finite words, the automaton accepts.
Definition (Accepting words and language of an automaton)
A word c1c2 . . . cn with ci ∈ Σ is accepted by automaton A over Σ,if there exists states q0,q2, . . .qn all from Q such that
q0c1Ð→ q1
c2Ð→ q2c3Ð→ . . . qn−1
cnÐ→ qn ,
and were q0 ∈ I and qn ∈ F . The language of an FSA A, writtenL(A), is the set of all words A accepts
75 / 672
![Page 76: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/76.jpg)
FSA example
q0start q1 q2
a
b
a
b
c
76 / 672
![Page 77: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/77.jpg)
Example: identifiers
Regular expression
identifier = letter(letter ∣ digit)∗ (6)
startstart in_idletter
letter
digit
• transition function/relation δ not completely defined (= partialfunction)
77 / 672
![Page 78: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/78.jpg)
Example: identifiers
Regular expression
identifier = letter(letter ∣ digit)∗ (6)
startstart in_id
error
letter
other
letter
digitother
any
• transition function/relation δ not completely defined (= partialfunction)
78 / 672
![Page 79: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/79.jpg)
Automata for numbers: natural numbers
digit = [0 − 9]nat = digit+
(7)
startdigit
digit
79 / 672
![Page 80: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/80.jpg)
Signed natural numbers
signednat = (+ ∣ −)nat ∣ nat (8)
start+
−
digit
digit
digit
80 / 672
![Page 81: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/81.jpg)
Signed natural numbers: non-deterministic
start
start
+
−
digit
digit
digit
digit
81 / 672
![Page 82: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/82.jpg)
Fractional numbers
frac = signednat(”.”nat)? (9)
start+
−
digit
digit
digit
. digit
digit
82 / 672
![Page 83: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/83.jpg)
Floats
digit = [0 − 9]nat = digit+
signednat = (+ ∣ −)nat ∣ natfrac = signednat(”.”nat)?float = frac(E signednat)?
(10)
• Note: no (explicit) recursion in the definitions• note also the treatment of digit in the automata.
83 / 672
![Page 84: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/84.jpg)
DFA for floats
start+
−
digit
digit
digit
.E
digit
digit
E
+
−
digit
digit
digit
84 / 672
![Page 85: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/85.jpg)
DFAs for comments
Pascal-style
start{
other
}
C, C++, Java
start/ ∗
other
∗
∗
other
/
85 / 672
![Page 86: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/86.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
86 / 672
![Page 87: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/87.jpg)
Example: identifiers
Regular expression
identifier = letter(letter ∣ digit)∗ (6)
startstart in_idletter
letter
digit
• transition function/relation δ not completely defined (= partialfunction)
87 / 672
![Page 88: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/88.jpg)
Example: identifiers
Regular expression
identifier = letter(letter ∣ digit)∗ (6)
startstart in_id
error
letter
other
letter
digitother
any
• transition function/relation δ not completely defined (= partialfunction)
88 / 672
![Page 89: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/89.jpg)
Implementation of DFA (1)
startstart in_id finishletter
letter
digit
[other]
89 / 672
![Page 90: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/90.jpg)
Implementation of DFA (1): “code”
1 { s t a r t i n g s t a t e }23 i f the next c h a r a c t e r i s a l e t t e r4 then5 advance the i n pu t ;6 { now i n s t a t e 2 }7 whi le the next c h a r a c t e r i s a l e t t e r o r d i g i t8 do9 advance the i n pu t ;
10 { s t a y i n s t a t e 2 }11 end whi le ;12 { go to s t a t e 3 , w i thout advanc ing i npu t }13 accep t ;14 e l s e15 { e r r o r o r o t h e r c a s e s }16 end
90 / 672
![Page 91: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/91.jpg)
Explicit state representation
1 s t a t e := 1 { s t a r t }2 whi le s t a t e = 1 or 23 do4 case s t a t e of5 1 : case i n pu t c h a r a c t e r of6 l e t t e r : advance the i npu t ;7 s t a t e := 28 e l s e s t a t e := . . . . { e r r o r o r o t h e r } ;9 end case ;
10 2 : case i n pu t c h a r a c t e r of11 l e t t e r , d i g i t : advance the i npu t ;12 s t a t e := 2 ; { a c t u a l l y u n e s s e s s a r y }13 e l s e s t a t e := 3 ;14 end case ;15 end case ;16 end whi le ;17 i f s t a t e = 3 then accep t e l s e e r r o r ;
91 / 672
![Page 92: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/92.jpg)
Table representation of a DFA
aaaaaaastate
inputchar letter digit other
1 22 2 2 33
92 / 672
![Page 93: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/93.jpg)
Better table rep. of the DFA
aaaaaaastate
inputchar letter digit other accepting
1 2 no2 2 2 [3] no3 yes
add info for
• accepting or not• “non-advancing” transitions
• here: 3 can be reached from 2 via such a transition
93 / 672
![Page 94: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/94.jpg)
Table-based implementation
1 s t a t e := 1 { s t a r t }2 ch := next i n pu t c h a r a c t e r ;3 whi le not Accept [ s t a t e ] and not e r r o r ( s t a t e )4 do56 whi le s t a t e = 1 or 27 do8 newsta te := T [ s t a t e , ch ] ;9 { i f Advance [ s t a t e , ch ]
10 then ch := next i n pu t c h a r a c t e r } ;11 s t a t e := newsta te12 end whi le ;13 i f Accept [ s t a t e ] then accep t ;
94 / 672
![Page 95: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/95.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
95 / 672
![Page 96: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/96.jpg)
Non-deterministic FSA
Definition (NFA (with ε transitions))
A non-deterministic finite-state automaton (NFA for short) A overan alphabet Σ is a tuple (Σ,Q, I ,F , δ), where
• Q: finite set of states• I ⊆ Q, F ⊆ Q: initial and final states.• δof Q ×Σ→ 2Q transition function
In case, one uses the alphabet Σ + {ε}, one speaks about an NFAwith ε-transitions.
• in the following: NFA mostly means, allowing ε transitions17
• ε: treated differently than the “normal” letters from Σ.• δ can equivalently be interpreted as relation: δ ⊆ Q ×Σ ×Q(transition relation labelled by elements from Σ).
17It does not matter much anyhow, as we will see.96 / 672
![Page 97: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/97.jpg)
Language of an NFA
• Remember L(A) (Definition 7 on page 75)• applying definition directly to Σ + {ε}: accepting words“containing” letters ε
• as said: special treatment for ε-transitions/ε-“letters”. ε ratherrepresents absence of input character/letter.
Definition (Acceptance with ε-transitions)
A word w over alphabet Σ is accepted by an NFA withε-transitions, if there exists a word w ′ which is accepted by theNFA with alphabet Σ + {ε} accordingto Definition 7 and where w is w ′ with all occurrences of ε removed.
Alternative (but equivalent) intuitionA reads one character after the other (following its transitionrelation). If in a state with an outgoing ε-transition, A can move toa corresponding successor state without reading an input symbol.
97 / 672
![Page 98: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/98.jpg)
NFA vs. DFA
• NFA: often easier (and smaller) to write down, esp. startingfrom a reg expression.
• Non-determinism: not immediately transferable to an algo
starta
ε
a
ε
ε
b
starta
b b
b
98 / 672
![Page 99: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/99.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
99 / 672
![Page 100: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/100.jpg)
Why non-deterministic FSA?
Task: recognize ∶=, <=, and = as three different tokens:
start
start
start
return ASSIGN
return LE
return EQ
∶ =
< =
=
100 / 672
![Page 101: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/101.jpg)
What about the following 3 tokens?
start
start
start
return LE
return NE
return LT
< =
< >
<
101 / 672
![Page 102: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/102.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
102 / 672
![Page 103: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/103.jpg)
Regular expressions → NFA
• needed: a systematic translation• conceptually easiest: translate to NFA (with ε-transitions)
• postpone determinization for a second step• (postpone minimization for later, as well)
Compositional construction [?]Design goal: The NFA of a compound regular expression is given bytaking the NFA of the immediate subexpressions and connectingthem appropriately.
• construction slightly18 simpler, if one uses automata with onestart and one accepting state
⇒ ample use of ε-transitions
18does not matter much, though.103 / 672
![Page 104: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/104.jpg)
Illustration for ε-transitions
start
return ASSIGN
return LE
return EQ
∶ =
< =
=
ε
ε
ε
104 / 672
![Page 105: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/105.jpg)
Thompson’s construction: basic expressions
basic regular expressionsbasic (= non-composed) regular expressions: ε, ∅, a (for alla ∈ Σ)
startε
starta
105 / 672
![Page 106: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/106.jpg)
Thompson’s construction: compound expressions
. . .r
. . .sε
start
. . .r
. . .s
ε
ε
ε
ε
106 / 672
![Page 107: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/107.jpg)
Thompson’s construction: compound expressions: iteration
. . .r
ε
start
ε
107 / 672
![Page 108: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/108.jpg)
Example
starta
starta ε b
1start
2 3 4 5
8
6 7
ab ∣ a
ε
a ε b
ε
ε
a
ε
108 / 672
![Page 109: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/109.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
109 / 672
![Page 110: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/110.jpg)
Determinization: the subset construction
Main idea• Given a non-det. automaton A. To construct a DFA A:instead of backtracking: explore all successors “at the sametime” ⇒
• each state q′ in A: represents a subset of states from A• Given a word w : “feeding” that to A leads to the staterepresenting all states of A reachable via w .
• side remark: this construction, known also as powersetconstruction, seems straightforward enough, but: analogousconstructions works for some other kinds of automata, as well,but for others, the approach does not work.19
• Origin [?]
19For some forms of automata, non-deterministic versions are strictly moreexpressive than the deterministic one.
110 / 672
![Page 111: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/111.jpg)
Some notation/definitions
Definition (ε-closure, a-successors)Given a state q, the ε-closure of q, written closeε(a), is the set ofstates reachable via zero, one, or more ε-transitions. We write qafor the set of states, reachable from q with one a-transition. Bothdefinitions are used analogously for sets of states.
111 / 672
![Page 112: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/112.jpg)
Transformation process: sketch of the algo
Input: NFA A over a given ΣOutput: DFA A
1. the initial state: closeε(I ), where I are the initial states of A2. for a state Q ′ in A: the a-sucessor of Q is given by
closeε(Qa), i.e.,Q
aÐ→ closeε(Qa) (11)
3. repeat step 2 for all states in A and all a ∈ Σ, until no morestates are being added
4. the accepting states in A: those containing at least oneaccepting states of A.
112 / 672
![Page 113: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/113.jpg)
Example ab ∣ a
1start
2 3 4 5
8
6 7
ab ∣ a
ε
a ε b
ε
ε
a
ε
113 / 672
![Page 114: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/114.jpg)
Example ab ∣ a
1start
2 3 4 5
8
6 7
ab ∣ a
ε
a ε b
ε
ε
a
ε
{1,2,6}start {3,4,7,8} {5,8} ab ∣ aa b
114 / 672
![Page 115: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/115.jpg)
Example: identifiers
Remember: regexpr for identifies from equation (6)
1start 2 3 4
5 6
9
7 8
10letter ε ε
ε
ε
letter
ε
ε
ε
digit
ε
ε
115 / 672
![Page 116: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/116.jpg)
Identifiers: DFA
{1}start {2,3,4,5,7,10}
{4,5,6,7,9,10}
{4,5,7,8,9,10}
letter
letter
digit
digitletter
letter
digit
116 / 672
![Page 117: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/117.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
117 / 672
![Page 118: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/118.jpg)
Minimization
• automatic construction of DFA (via e.g. Thompson): oftenmany superfluous states
• goal: “combine” states of a DFA without changing theaccepted language
Properties of the minimization algoCanonicity: all DFA for the same language are transformed to the
same DFAMinimality: resulting DFA has minimal number of states
• “side effects”: answers to equivalence problems• given 2 DFA: do they accept the same language?• given 2 regular expressions, do they describe the same
language?
• modern version: [?].
118 / 672
![Page 119: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/119.jpg)
Hopcroft’s partition refinement algo for minimization
• starting point: complete DFA (i.e., error-state possibly needed)• first idea: equivalent states in the given DFA may be identified• equivalent: when used as starting point, accepting the samelanguage
• partition refinement:• works “the other way around”• instead of collapsing equivalent states:
• start by “collapsing as much as possible” and then,• iteratively, detect non-equivalent states, and then split a
“collapsed” state• stop when no violations of “equivalence” are detected
• partitioning of a set (of states):• worklist: data structure of to keep non-treated classes,termination if worklist is empty
119 / 672
![Page 120: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/120.jpg)
Partition refinement: a bit more concrete
• Initial partitioning: 2 partitions: set containing all acceptingstates F , set containing all non-accepting states Q/F
• Loop do the following: pick a current equivalence class Qi anda symbol a
• if for all q ∈ Qi , δ(q, a) is member of the same class Qj ⇒consider Qi as done (for now)
• else:• split Qi into Q1
i , . . .Qki s.t. the above situation is repaired for
each Q li (but don’t split more than necessary).
• be aware: a split may have a “cascading effect”: other classesbeing fine before the split of Qi need to be reconsidered ⇒worklist algo
• stop if the situation stabilizes, i.e., no more split happens (=worklist empty, at latest if back to the original DFA)
120 / 672
![Page 121: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/121.jpg)
Split in partition refinement: basic step
q1
q2
q3
q4
q5
q6
a
b
c
d
e
a
a
a
a
a
a
• before the split {q1,q2, . . . ,q6}• after the split on a: {q1,q2},{q3,q4,q5},{q6} 121 / 672
![Page 122: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/122.jpg)
Identifiers: DFA
{1}start {2,3,4,5,7,10}
{4,5,6,7,9,10}
{4,5,7,8,9,10}
letter
letter
digit
digitletter
letter
digit
122 / 672
![Page 123: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/123.jpg)
Completed automaton
{1}start {2,3,4,5,7,10}
{4,5,6,7,9,10}
{4,5,7,8,9,10}error
letter
letter
digit
digitletter
letter
digit
digit
123 / 672
![Page 124: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/124.jpg)
Minimized automaton (error state omitted)
startstart in_idletter
letter
digit
124 / 672
![Page 125: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/125.jpg)
Another example: partition refinement & error state
(a ∣ ε)b∗ (12)
1start 2
3
a
b
b
b
125 / 672
![Page 126: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/126.jpg)
Partition refinement
error state added
1start 2
3 error
a
b
b
b
a
a
126 / 672
![Page 127: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/127.jpg)
Partition refinement
initial partitioning
1start 2
3 error
a
b
b
b
a
a
127 / 672
![Page 128: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/128.jpg)
Partition refinement
split after a
1start 2
3 error
a
b
b
b
a
a
128 / 672
![Page 129: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/129.jpg)
End result (error state omitted again)
{1}start {2,3}
a
b
b
129 / 672
![Page 130: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/130.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
130 / 672
![Page 131: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/131.jpg)
Tools for generating scanners
• scanners: simple and well-understood part of compiler• hand-coding possible• mostly better off with: generated scanner• standard tools lex / flex (also in combination with parsergenerators, like yacc / bison
• variants exist for many implementing languages• based on the results of this section
131 / 672
![Page 132: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/132.jpg)
Main idea of (f)lex and similar
• output of lexer/scanner = input for parser• programmer specifies regular expressions for each token-classand corresponding actions20 (and whitespace, comments etc.)
• the spec. language offers some conveniences (extended regexprwith priorities, associativities etc) to ease the task
• automatically translated to NFA (e.g. Thompson)• then made into a deterministic DFA (“subset construction”)• minimized (with a little care to keep the token classes separate)• implement the DFA (usually with the help of a tablerepresentation)
20Tokens and actions of a parser will be covered later. For example,identifiers and digits as described but the reg. expressions, would end up in twodifferent token classes, where the actual string of characters (also known aslexeme) being the value of the token attribute.
132 / 672
![Page 133: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/133.jpg)
INF5110 – Compiler Construction
Grammars
19. 01. 2016
133 / 672
![Page 134: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/134.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
134 / 672
![Page 135: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/135.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
135 / 672
![Page 136: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/136.jpg)
Bird eye’s view of a parser
sequenceof tokens Parser tree repre-
sentation
• check that the token sequence correspond to a syntacticallycorrect program
• if yes: yield tree as intermediate representation for subsequentphases
• if not: give understandable error message(s)• we will encounter various kinds of trees
• derivation trees (derivation in a (context-free) grammar)• parse tree, concrete syntax tree• abstract syntax trees
• mentioned tree forms hang together• result of a parser: typically AST
136 / 672
![Page 137: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/137.jpg)
Sample syntax tree
program
stmts
stmt
assign-stmt
expr
+
var
y
var
x
var
x
decs
val=vardec
137 / 672
![Page 138: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/138.jpg)
Natural-language parse tree
S
NP
DT
The
N
dog
VP
V
bites
NP
NP
the
N
man
138 / 672
![Page 139: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/139.jpg)
“Interface” between scanner and parser
• remember: task of scanner = “chopping up” the input charstream (throw away white space etc) and classify the pieces (1piece = lexeme)
• classified lexeme = token• sometimes we use ⟨integer,”42”⟩
• integer: “class” or “type” of the token, also called token name• ”42” : value of the token attribute (or just value). Here, it’s
directly the lexeme (a string or sequence of chars)
• a note on (sloppyness/ease of) terminology: often: the tokenname is simply just called the token
• for (context-free) grammars: the token (symbol) corrrespondsthere to terminal symbols (or terminals, for short)
139 / 672
![Page 140: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/140.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
140 / 672
![Page 141: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/141.jpg)
Grammars
• in this chapter(s): focus on context-free grammars• thus here: grammar = CFG• as in the context of regular expressions/languages: language =(typically infinite) set of words
• grammar = formalism to unambiguously specify a language• intended language: all syntactically correct programs of agiven progamming language
SloganA CFG describes the syntax of a programming language. a
aand some say, regular expressions describe its microsyntax.
• note: a compiler will reject some syntactically correctprograms, whose violations cannot be captured by CFGs.
141 / 672
![Page 142: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/142.jpg)
Context-free grammar
Definition (CFG)A context-free grammar G is a 4-tuple G = (ΣT ,ΣN ,S ,P):1. 2 disjoint finite alphabets of terminals ΣT and2. non-terminals ΣN
3. 1 start-symbol S ∈ ΣN (a non-terminal)4. productions P = finite subset of ΣN × (ΣN +ΣT )∗
• terminal symbols: corresponds to tokens in parser = basicbuilding blocks of syntax
• non-terminals: (e.g. “expression”, “while-loop”,“method-definition” . . . )
• grammar: generating (via “derivations”) languages• parsing: the inverse problem⇒ CFG = specification
142 / 672
![Page 143: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/143.jpg)
BNF notation
• popular & common format to write CFGs, i.e., describecontext-free languages
• named after pioneering (seriously) work on Algol 60• notation to write productions/rules + some extrameta-symbols for convenience and grouping
Slogan: Backus-Naur formWhat regular expressions are for regular languages is BNF forcontext-free languages.
143 / 672
![Page 144: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/144.jpg)
“Expressions” in BNF
exp → expopexp ∣ ( exp ) ∣ numberop → + ∣ − ∣ ∗
(13)
• “→” indicating productions and “ ∣ ” indicating alternatives. 21
• convention: terminals written boldface, non-terminals italic• also simple math symbols like “+” and “(′′ are meant above asterminals.
• start symbol here: expr• remember: terminals like number correspond to tokens, resp.token classes. The attributes/token values are not relevanthere.
21The grammar can be seen as consisting of 6 productions/rules, 3 for exprand 3 for op, the ∣ is just for convenience. Side remark: Often also ∶∶= is usedfor →.
144 / 672
![Page 145: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/145.jpg)
Different notations
• BNF: notationally not 100% “standardized” across books/tools• “classic” way (Algol 60):
<exp> : := <exp> <op> <exp>| ( <exp> )| NUMBER
<op> : := + | − | ∗
• Extended BNF (EBNF) and yet another style
exp → exp ( ” + ” ∣ ” − ” ∣ ” ∗ ” ) exp∣ ”(”exp”)” ∣ ”number”
(14)
• note: parentheses as terminals vs. as metasymbols
145 / 672
![Page 146: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/146.jpg)
Different ways of writing the same grammar
• directly written as 6 pairs (6 rules, 6 productions) fromΣN × (ΣN ∪ΣT )∗, with “→” as nice looking “separator”:
exp → expopexpexp → ( exp )
exp → numberop → +
op → −
op → ∗
(15)
%%% Local Variables: %%% mode: latex %%% TEX-master:"../../main" %%% End:
• choice of non-terminals: irrelevant (except for humanreadability):
E → EOE ∣ (E) ∣ numberO → + ∣ − ∣ ∗
(16)
%%% Local Variables: %%% mode: latex %%% TEX-master:"../../main" %%% End:
• still: we count 6 productions
146 / 672
![Page 147: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/147.jpg)
Grammars as language generators
Deriving a word:Start from start symbol. Pick a “matching” rule to rewrite thecurrent word to a new one; repeat until terminal symbols, only.
• non-deterministic process• rewrite relation for derivations:
• one step rewriting: w1 ⇒ w2• one step using rule n: w1 ⇒n w2• many steps: ⇒∗ etc.
language of grammar G
L(G) = {s ∣ start ⇒∗ s and s ∈ Σ∗T}
147 / 672
![Page 148: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/148.jpg)
Example derivation for (number−number)∗number
exp ⇒ expopexp⇒ (exp)opexp⇒ (expopexp)opexp⇒ (nopexp)opexp⇒ (n−exp)opexp⇒ (n−n)opexp⇒ (n−n)∗exp⇒ (n−n)∗n
%%% Local Variables: %%% mode: latex %%% TEX-master:"../../main" %%% End:
• underline the “place” were a rule is used, i.e., an occurrence ofthe non-terminal symbol is being rewritten/expanded
• here: leftmost derivation22
22We’ll come back to that later, it will be important.148 / 672
![Page 149: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/149.jpg)
Rightmost derivation
exp ⇒ expopexp⇒ expopn⇒ exp∗n⇒ (expopexp)∗n⇒ (expopn)∗n⇒ (exp−n)∗n⇒ (n−n)∗n
%%% Local Variables: %%% mode: latex %%% TEX-master:"../../main" %%% End:
• other (“mixed”) derivations for the same word possible
149 / 672
![Page 150: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/150.jpg)
Some easy requirements for reasonable grammars
• all symbols (terminals and non-terminals): should occur in aword derivable from the start symbol
• words containing only non-terminals should be derivable• an example of a silly grammar G (start-symbol A)
A → BxB → AyC → z
• L(G) = ∅• those “sanitary conditions”: very minimal “common sense”requirements
150 / 672
![Page 151: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/151.jpg)
Parse tree
• derivation: if viewed as sequence of steps ⇒ linear “structure”• order of individual steps: irrelevant• ⇒ order not needed for subsequent steps• parse tree: structure for the essence of derivation• also called concrete syntax tree.23
1exp
2exp
n
3op
+
4exp
n
• numbers in the tree• not part of the parse tree, indicate order of derivation, only• here: leftmost derivation
23there will be abstract syntax trees as well.151 / 672
![Page 152: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/152.jpg)
Parse tree
• derivation: if viewed as sequence of steps ⇒ linear “structure”• order of individual steps: irrelevant• ⇒ order not needed for subsequent steps• parse tree: structure for the essence of derivation• also called concrete syntax tree.23
1exp
2exp
n
3op
+
4exp
n
• numbers in the tree• not part of the parse tree, indicate order of derivation, only• here: leftmost derivation
23there will be abstract syntax trees as well.152 / 672
![Page 153: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/153.jpg)
Parse tree
• derivation: if viewed as sequence of steps ⇒ linear “structure”• order of individual steps: irrelevant• ⇒ order not needed for subsequent steps• parse tree: structure for the essence of derivation• also called concrete syntax tree.23
1exp
2exp
n
3op
+
4exp
n
• numbers in the tree• not part of the parse tree, indicate order of derivation, only• here: leftmost derivation
23there will be abstract syntax trees as well.153 / 672
![Page 154: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/154.jpg)
Parse tree
• derivation: if viewed as sequence of steps ⇒ linear “structure”• order of individual steps: irrelevant• ⇒ order not needed for subsequent steps• parse tree: structure for the essence of derivation• also called concrete syntax tree.23
1exp
2exp
n
3op
+
4exp
n
• numbers in the tree• not part of the parse tree, indicate order of derivation, only• here: leftmost derivation
23there will be abstract syntax trees as well.154 / 672
![Page 155: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/155.jpg)
Parse tree
• derivation: if viewed as sequence of steps ⇒ linear “structure”• order of individual steps: irrelevant• ⇒ order not needed for subsequent steps• parse tree: structure for the essence of derivation• also called concrete syntax tree.23
1exp
2exp
n
3op
+
4exp
n
• numbers in the tree• not part of the parse tree, indicate order of derivation, only• here: leftmost derivation
23there will be abstract syntax trees as well.155 / 672
![Page 156: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/156.jpg)
Parse tree
• derivation: if viewed as sequence of steps ⇒ linear “structure”• order of individual steps: irrelevant• ⇒ order not needed for subsequent steps• parse tree: structure for the essence of derivation• also called concrete syntax tree.23
1exp
2exp
n
3op
+
4exp
n
• numbers in the tree• not part of the parse tree, indicate order of derivation, only• here: leftmost derivation
23there will be abstract syntax trees as well.156 / 672
![Page 157: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/157.jpg)
Parse tree
• derivation: if viewed as sequence of steps ⇒ linear “structure”• order of individual steps: irrelevant• ⇒ order not needed for subsequent steps• parse tree: structure for the essence of derivation• also called concrete syntax tree.23
1exp
2exp
n
3op
+
4exp
n
• numbers in the tree• not part of the parse tree, indicate order of derivation, only• here: leftmost derivation
23there will be abstract syntax trees as well.157 / 672
![Page 158: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/158.jpg)
Another parse tree (numbers for rightmost derivation)
1exp
4exp
(5exp
8exp
n
7op
−
6exp
n
)
3op
∗
2exp
n
158 / 672
![Page 159: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/159.jpg)
Another parse tree (numbers for rightmost derivation)
1exp
4exp
(5exp
8exp
n
7op
−
6exp
n
)
3op
∗
2exp
n
159 / 672
![Page 160: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/160.jpg)
Another parse tree (numbers for rightmost derivation)
1exp
4exp
(5exp
8exp
n
7op
−
6exp
n
)
3op
∗
2exp
n
160 / 672
![Page 161: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/161.jpg)
Another parse tree (numbers for rightmost derivation)
1exp
4exp
(5exp
8exp
n
7op
−
6exp
n
)
3op
∗
2exp
n
161 / 672
![Page 162: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/162.jpg)
Another parse tree (numbers for rightmost derivation)
1exp
4exp
(5exp
8exp
n
7op
−
6exp
n
)
3op
∗
2exp
n
162 / 672
![Page 163: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/163.jpg)
Another parse tree (numbers for rightmost derivation)
1exp
4exp
(5exp
8exp
n
7op
−
6exp
n
)
3op
∗
2exp
n
163 / 672
![Page 164: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/164.jpg)
Another parse tree (numbers for rightmost derivation)
1exp
4exp
(5exp
8exp
n
7op
−
6exp
n
)
3op
∗
2exp
n
164 / 672
![Page 165: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/165.jpg)
Another parse tree (numbers for rightmost derivation)
1exp
4exp
(5exp
8exp
n
7op
−
6exp
n
)
3op
∗
2exp
n
165 / 672
![Page 166: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/166.jpg)
Another parse tree (numbers for rightmost derivation)
1exp
4exp
(5exp
8exp
n
7op
−
6exp
n
)
3op
∗
2exp
n
166 / 672
![Page 167: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/167.jpg)
Another parse tree (numbers for rightmost derivation)
1exp
4exp
(5exp
8exp
n
7op
−
6exp
n
)
3op
∗
2exp
n
167 / 672
![Page 168: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/168.jpg)
Another parse tree (numbers for rightmost derivation)
1exp
4exp
(5exp
8exp
n
7op
−
6exp
n
)
3op
∗
2exp
n
168 / 672
![Page 169: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/169.jpg)
Another parse tree (numbers for rightmost derivation)
1exp
4exp
(5exp
8exp
n
7op
−
6exp
n
)
3op
∗
2exp
n
169 / 672
![Page 170: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/170.jpg)
Another parse tree (numbers for rightmost derivation)
1exp
4exp
(5exp
8exp
n
7op
−
6exp
n
)
3op
∗
2exp
n
170 / 672
![Page 171: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/171.jpg)
Another parse tree (numbers for rightmost derivation)
1exp
4exp
(5exp
8exp
n
7op
−
6exp
n
)
3op
∗
2exp
n
171 / 672
![Page 172: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/172.jpg)
Another parse tree (numbers for rightmost derivation)
1exp
4exp
(5exp
8exp
n
7op
−
6exp
n
)
3op
∗
2exp
n
172 / 672
![Page 173: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/173.jpg)
Abstract syntax tree
• parse tree: contains still unnecessary details• specifically: parentheses or similar used for grouping• tree-structure: can express the intended grouping already• remember: tokens contain also attibute values also (e.g.: fulltoken for token class n may contain lexeme like ”42” . . . )
1exp
2exp
n
3op
+
4exp
n
+
3 4
173 / 672
![Page 174: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/174.jpg)
Abstract syntax tree
• parse tree: contains still unnecessary details• specifically: parentheses or similar used for grouping• tree-structure: can express the intended grouping already• remember: tokens contain also attibute values also (e.g.: fulltoken for token class n may contain lexeme like ”42” . . . )
1exp
2exp
n
3op
+
4exp
n
+
3 4
174 / 672
![Page 175: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/175.jpg)
Abstract syntax tree
• parse tree: contains still unnecessary details• specifically: parentheses or similar used for grouping• tree-structure: can express the intended grouping already• remember: tokens contain also attibute values also (e.g.: fulltoken for token class n may contain lexeme like ”42” . . . )
1exp
2exp
n
3op
+
4exp
n
+
3 4
175 / 672
![Page 176: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/176.jpg)
Abstract syntax tree
• parse tree: contains still unnecessary details• specifically: parentheses or similar used for grouping• tree-structure: can express the intended grouping already• remember: tokens contain also attibute values also (e.g.: fulltoken for token class n may contain lexeme like ”42” . . . )
1exp
2exp
n
3op
+
4exp
n
+
3 4
176 / 672
![Page 177: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/177.jpg)
Abstract syntax tree
• parse tree: contains still unnecessary details• specifically: parentheses or similar used for grouping• tree-structure: can express the intended grouping already• remember: tokens contain also attibute values also (e.g.: fulltoken for token class n may contain lexeme like ”42” . . . )
1exp
2exp
n
3op
+
4exp
n
+
3 4
177 / 672
![Page 178: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/178.jpg)
Abstract syntax tree
• parse tree: contains still unnecessary details• specifically: parentheses or similar used for grouping• tree-structure: can express the intended grouping already• remember: tokens contain also attibute values also (e.g.: fulltoken for token class n may contain lexeme like ”42” . . . )
1exp
2exp
n
3op
+
4exp
n
+
3 4
178 / 672
![Page 179: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/179.jpg)
Abstract syntax tree
• parse tree: contains still unnecessary details• specifically: parentheses or similar used for grouping• tree-structure: can express the intended grouping already• remember: tokens contain also attibute values also (e.g.: fulltoken for token class n may contain lexeme like ”42” . . . )
1exp
2exp
n
3op
+
4exp
n
+
3 4
179 / 672
![Page 180: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/180.jpg)
AST vs. CST
• parse tree• important conceptual structure, to talk about grammars . . . ,• most likely not explicitly implemented in a parser
• AST is a concrete datastructure• important IR of the syntax of the language to be implemented• written in the meta-language used in the implementation• therefore: nodes like + and 3 are no longer tokens or lexemes• concrete data stuctures in the meta-language (C-structs,
instances of Java classes, or what suits best)• the figure is meant as schematic only• produced by the parser, used by later phases (often by more
than one)• note also: we use 3 in the AST, where lexeme was "3"⇒ at some point the lexeme string (for numbers) is translated to
a number in the meta-language (typically already by the lexer)
180 / 672
![Page 181: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/181.jpg)
Plausible schematic AST (for the other parse tree)
*
-
34 3
42
• this AST: rather “simplified” version of the CST• an AST closer to the CST (just dropping the parentheses):under certain circumstances nothing wrong with it either.
181 / 672
![Page 182: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/182.jpg)
Conditionals
Conditionals G1
stmt → if -stmt ∣ otherif -stmt → if ( exp ) stmt
→ if ( exp ) stmtelsestmtexp → 0 ∣ 1
(17)
182 / 672
![Page 183: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/183.jpg)
Parse tree
if (0 )otherelseother
stmt
if -stmt
if ( exp
0
) stmt
other
else stmt
other
183 / 672
![Page 184: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/184.jpg)
Another grammar for conditionals
Conditionals G2
stmt → if -stmt ∣ otherif -stmt → if ( exp ) stmtelse_part
else_part → elsestmt ∣ εexp → 0 ∣ 1
(18)
ε = empty word
184 / 672
![Page 185: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/185.jpg)
A further parse tree + an AST
stmt
if -stmt
if ( exp
0
) stmt
other
else_part
else stmt
other
COND
0 other other
NoteA missing else part may be represented by null-pointers inlanguages like Java.
185 / 672
![Page 186: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/186.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
186 / 672
![Page 187: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/187.jpg)
Ambiguous grammar
Definition (Ambiguous grammar)
A grammar is ambiguous if there exists a word with two differentparse trees.
Remember grammar from equation (13):
exp → expopexp ∣ ( exp ) ∣ numberop → + ∣ − ∣ ∗
Consider:
n−n∗n
187 / 672
![Page 188: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/188.jpg)
2 resulting ASTs
∗
−
34 3
42
−
34 ∗
3 42different parse trees ⇒ different24 ASTs ⇒ different25
meaning
Side remark: different meaningThe issue of “different meaning” may in practice be subtle: is(x + y) − z the same as x + (y − z)?
24At least in most cases.188 / 672
![Page 189: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/189.jpg)
2 resulting ASTs
∗
−
34 3
42
−
34 ∗
3 42different parse trees ⇒ different24 ASTs ⇒ different25
meaning
Side remark: different meaningThe issue of “different meaning” may in practice be subtle: is(x + y) − z the same as x + (y − z)? In principle yes, but whatabout MAXINT ?
24At least in most cases.189 / 672
![Page 190: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/190.jpg)
Precendence & associativity
• one way to make a grammar unambiguous (or less ambiguous)• For instance:
binary op’s precedence associativity+, − low left×, / higher left↑ highest right
• a ↑ b written in standard math as ab:
5 + 3/5 × 2 + 4 ↑ 2 ↑ 3 =5 + 3/5 × 2 + 42
3 =(5 + ((3/5 × 2)) + (4(23))) .
• mostly fine for binary ops, but usually also for unary ones(postfix or prefix)
190 / 672
![Page 191: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/191.jpg)
Unambiguity without associativity and precedence
• removing ambiguity by reformulating the grammar• precedence for op’s: precedence cascade
• some bind stronger than others (∗ more than +)• introduce separate non-terminal for each precedence level
(here: terms and factors)
191 / 672
![Page 192: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/192.jpg)
Expressions, revisited
• associativity• left-assoc: write the corresponding rules in left-recursive
manner, e.g.:
exp → expaddopterm ∣ term
• right-assoc: analogous, but right-recursive• non-assoc:
exp → termaddopterm ∣ term
factors and terms
exp → expaddopterm ∣ termaddop → + ∣ −
term → termmulopterm ∣ factormulop → ∗
factor → ( exp ) ∣ number
(19)
192 / 672
![Page 193: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/193.jpg)
34 − 3 ∗ 42
exp
exp
term
factor
n
addop
−
term
term
factor
n
mulop
∗
factor
n
193 / 672
![Page 194: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/194.jpg)
34 − 3 − 42
exp
exp
exp
term
factor
n
addop
−
term
factor
n
addop
−
term
factor
n
194 / 672
![Page 195: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/195.jpg)
Real life example
195 / 672
![Page 196: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/196.jpg)
Non-essential ambiguity
left-assoc
stmt-seq → stmt-seq;stmt ∣ stmtstmt → S
stmt-seq
stmt
S
; stmt-seq
stmt
S
; stmt-seq
stmt
S
196 / 672
![Page 197: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/197.jpg)
Non-essential ambiguity (2)
right-assoc representation instead
stmt-seq → stmt;stmt-seq ∣ stmtstmt → S
stmt-seq
stmt-seq
stmt-seq
stmt
S
; stmt
S
; stmt
S
197 / 672
![Page 198: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/198.jpg)
Possible AST representations
Seq
S S S
Seq
S S S
198 / 672
![Page 199: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/199.jpg)
Dangling else
Nested if’s
if (0 ) if (1 )otherelseother
Remember grammar from equation (17):
stmt → if -stmt ∣ otherif -stmt → if ( exp ) stmt
→ if ( exp ) stmtelsestmtexp → 0 ∣ 1
199 / 672
![Page 200: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/200.jpg)
Should it be like this
stmt
if -stmt
if ( exp
0
) stmt
if -stmt
if ( exp
1
) stmt
other
else stmt
other
200 / 672
![Page 201: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/201.jpg)
. . . or like this
stmt
if -stmt
if ( exp
0
) stmt
if -stmt
if ( exp
1
) stmt
other
else stmt
other
• common convention: connect else to closest “free” (=dangling) occurrence
201 / 672
![Page 202: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/202.jpg)
Unambiguous grammar
Grammar
stmt → matched_stmt ∣ unmatch_stmtmatched_stmt → if ( exp )matched_stmtelsematched_stmt
∣ otherunmatch_stmt → if ( exp ) stmt
∣ if ( exp )matched_stmtelseunmatch_stmtexp → 0 ∣ 1
• never have an unmatched statement inside a matched• complex grammar, seldomly used• instead: ambiguous one, with extra “rule”: connect each elseto closest free if
• alternative: different syntax, e.g.,• mandatory else,• or require endif
202 / 672
![Page 203: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/203.jpg)
CST
stmt
unmatch_stmt
if ( exp
0
) stmt
matched_stmt
if ( exp
1
) elsematched_stmt
other203 / 672
![Page 204: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/204.jpg)
Adding sugar: extended BNF
• make CFG-notation more “convenient” (but without moretheoretical expressiveness)
• syntactic sugar
EBNFMain additional notational freedom: use regular expressions on therhs of productions. They can contain terminals and non-terminals
• EBNF: officially standardized, but often: all “sugared” BNFsare called EBNF
• in the standard:• α∗ written as {α}• α? written as [α]
• supported (in the standardized form or other) by some parsertools, but not in all
• remember equation (14)204 / 672
![Page 205: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/205.jpg)
EBNF examples
A → β{α} for A→ Aα ∣ βA → {α}β for A→ αA ∣ β
stmt-seq → stmt {;stmt}stmt-seq → {stmt;} stmtif -stmt → if ( exp ) stmt[elsestmt]
greek letters: for non-terminals or terminals.
205 / 672
![Page 206: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/206.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
206 / 672
![Page 207: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/207.jpg)
Syntax diagrams
• graphical notation for CFG• used for Pascal• important concepts like ambiguity etc: not easily recognizable
• not much in use any longer• example for unsigned integer (taken from the TikZ manual):
uint . digit E
+
-
uint
207 / 672
![Page 208: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/208.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
208 / 672
![Page 209: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/209.jpg)
The Chomsky hierarchy
• linguist Noam Chomsky [Chomsky, 1956]• important classification of (formal) languages (sometimesChomsky-Schützenberger)
• 4 levels: type 0 languages – type 3 languages• levels related to machine models that generate/recognize them• so far: regular languages and CF languages
209 / 672
![Page 210: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/210.jpg)
Overview
rule format languages machines closed3 A→ aB , A→ a regular NFA, DFA all2 A→ α1βα2 CF pushdown
automata∪, ∗, ○
1 α1Aα2 → α1βα2 context-sensitive
(linearly re-stricted au-tomata)
all
0 α → β, α /= ε recursivelyenumerable
Turing ma-chines
all, exceptcomplement
Conventions• terminals a,b, . . . ∈ ΣN ,• non-terminals A,B, . . . ∈ ΣT
• general words α,β . . . ∈ (ΣT ∪ΣN)∗
210 / 672
![Page 211: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/211.jpg)
Phases of a compiler & hierarchy
“Simplified” design?1 big grammar for the whole compiler? Or at least a CSG for thefront-end, or a CFG combining parsing and scanning?
theoretically possible, but bad idea:
• efficiency• bad design• especially combining scanner + parser in one BNF:
• grammar would be needlessly large• separation of concerns: much clearer/ more efficient design
• for scanner/parsers: regular expressions + (E)BNF: simply theformalisms of choice!
• front-end needs to do more than checking syntax, CFGs notexpressive enough
• for level-2 and higher: situation gets less clear-cut, plain CSGnot too useful for compilers
211 / 672
![Page 212: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/212.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
212 / 672
![Page 213: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/213.jpg)
BNF-grammar for TINY
program → stmt-seqstmt-seq → stmt-seq;stmt ∣ stmt
stmt → if -stmt ∣ repeat-stmt ∣ assign-stmt∣ read -stmt ∣ write-stmt
if -stmt → ifexprthenstmtend∣ ifexprthenstmtelsestmtend
repeat-stmt → repeatstmt-sequntilexprassign-stmt → identifier ∶= exprread -stmt → read identifierwrite-stmt → write expr
expr → simple-exprcomparison-opsimple-expr ∣ simple-exprcomparison-op → < ∣ =
simple-expr → simple-expraddopterm ∣ termaddop → + ∣ −
term → termmulopfactor ∣ factormulop → ∗ ∣ /
factor → ( expr ) ∣ number ∣ identifier
213 / 672
![Page 214: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/214.jpg)
Syntax tree nodes
typedef enum {StmtK ,ExpK} NodeKind;typedef enum {IfK ,RepeatK ,AssignK ,ReadK ,WriteK} StmtKind;typedef enum {OpK ,ConstK ,IdK} ExpKind;
/* ExpType is used for type checking */typedef enum {Void ,Integer ,Boolean} ExpType;
#define MAXCHILDREN 3
typedef struct treeNode{ struct treeNode * child[MAXCHILDREN ];
struct treeNode * sibling;int lineno;NodeKind nodekind;union { StmtKind stmt; ExpKind exp;} kind;union { TokenType op;
int val;char * name; } attr;
ExpType type; /* for type checking of exps */
214 / 672
![Page 215: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/215.jpg)
Comments on C-representation
• typical use of enum type for that (in C)• enum’s in C can be very efficient• treeNode struct (records) is a bit “unstructured”• newer languages/higher-level than C: better structuringadvisable, especially for languages larger than Tiny.
• in Java-kind of languages: inheritance/subtyping and abstractclasses/interfaces often used for better structuring
215 / 672
![Page 216: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/216.jpg)
Sample Tiny program
read x; { input as integer }if 0 < x then { don ’t compute if x <= 0 }
fact := 1;repeat
fact := fact * x;x := x -1
until x = 0;write fact { output factorial of x }
end
216 / 672
![Page 217: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/217.jpg)
Same Tiny program again
read x ; { i npu t as i n t e g e r }i f 0 < x then { don ’ t compute i f x <= 0 }
f a c t := 1 ;r epea t
f a c t := f a c t ∗ x ;x := x −1
u n t i l x = 0 ;wr i t e f a c t { output f a c t o r i a l o f x }
end
• keywords / reserved words highlighted by bold-face type setting• reserved syntax like 0, :=, . . . is not bold-faced• comments are italicized
217 / 672
![Page 218: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/218.jpg)
Abstract syntax tree for a tiny program
218 / 672
![Page 219: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/219.jpg)
Some questions about the Tiny grammy
later given as assignment
• is the grammar unambiguous?• How can we change it so that the Tiny allows emptystatements?
• What if we want semicolons in between statements and notafter?
• What is the precedence and associativity of the differentoperators?
219 / 672
![Page 220: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/220.jpg)
INF5110 – Compiler Construction
Parsing
2. 02. 2016
220 / 672
![Page 221: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/221.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
221 / 672
![Page 222: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/222.jpg)
INF5110 – Compiler Construction
Semantic analysis
2. 02. 2016
222 / 672
![Page 223: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/223.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
223 / 672
![Page 224: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/224.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
224 / 672
![Page 225: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/225.jpg)
Overview over the chaptera
aSlides originally from Birger Møller-Pedersen
• semantics analysis in general• attribute grammars• symbol tables (not today)• data types and type checking (not today)
225 / 672
![Page 226: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/226.jpg)
Where are we now?
226 / 672
![Page 227: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/227.jpg)
What do we get from the parser?
• output of the parser: (abstract) syntax tree• often: in anticipation: nodes in the tree contain “space” to befilled out by SA
• examples:• for expression nodes: types• for identifier/name nodes: reference or pointer to the
declaration
assign-expr
subscript expr
identifiera
identifierindex
additive expr
number2
number4
227 / 672
![Page 228: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/228.jpg)
What do we get from the parser?
• output of the parser: (abstract) syntax tree• often: in anticipation: nodes in the tree contain “space” to befilled out by SA
• examples:• for expression nodes: types• for identifier/name nodes: reference or pointer to the
declaration
assign-expr
additive-expr
number
2
number
4
subscript-expr
identifier
index
identifier
a :array of int :int
:array of int :int
:int :int
:int :int
:int :int
: ?
228 / 672
![Page 229: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/229.jpg)
General remarks on semantic (or static) analysis
Rule of thumbCheck everything which is possible before executing (run-time vs.compile-time), but cannot already done during lexing/parsing(syntactical vs. semantical analysis)
• Goal: fill out “semantic” info (typically in the AST)• typically:
• names declared? (somewhere/uniquely/before use)• typing:
• declared type consistent with use• types of (sub)-expression consistent with used operations
• border between sematical vs. syntactic checking not always100% clear
• if a then ...: checked for syntax• if a + b then ...: semantical aspects as well?
229 / 672
![Page 230: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/230.jpg)
SA is nessessarily approximative
• note: not all can (precisely) be checked at compile-time25
• division by zero?• “array out of bounds”• “null pointer deref” (like r.a, if r is null)
• but note also: exact type cannot be determined staticallyeither
if x then 1 else "abc"• statically: ill-typeda
• dynamically (“run-time type”): string or int, or run-timetype error, if x turns out not to be a boolean, or if it’s null
aUnless some fancy behind-the-scence type conversions are done by thelanguage (the compiler). Perhaps print(if x then 1 else "abc") isaccepted, and integer 1 is implicitly converted to "1".
25For fundamental reasons (cf. also Rice’s theorem). Note thatapproximative checking is doable, resp. that’s what the SA is doing anyhow.
230 / 672
![Page 231: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/231.jpg)
SA remains tricky
A dream
However• no standard description language• no standard “theory” (apart from the toogeneral “context sensitive languages”)
• part of SA may seem ad-hoc, more “art”than “engineering”, complex
• but: well-established/well-founded (anddecidedly non-ad-hoc) fields do exist
• type systems, type checking• data-flow analysis . . . .
• in general• semantic “rules” must be invidiually
specified and implemented per language• rules: defined based on trees (for AST):
often straightforward to implement• clean language design includes clean
semantic rules231 / 672
![Page 232: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/232.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
232 / 672
![Page 233: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/233.jpg)
Attributes
Attribute• a “property” or characteristic feature of something• here: of language “constructs”. More specific in this chapter:• of syntactic elements, i.e., for non-terminals/terminal nodes insyntax trees
Static vs. dynamic• distinction between static and dynamic attributes• association attribute ↔ element: binding• static attributes: possible to determine at/determined atcompile time
• dynamic attributes: the others . . .
233 / 672
![Page 234: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/234.jpg)
Examples in our context
• data type of a variable : static/dynamic• value of an expression: dynamic (but seldomly static as well)• location of a variable in memory: typically dynamic (but in oldFORTRAN: static)
• object-code: static (but also: dynamic loading possible)
234 / 672
![Page 235: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/235.jpg)
Attribute grammar in a nutshell
• AG: general formalism to bind “attributes to trees” (wheretrees are given by a CFG)26
• two potential ways to calculate “properties” of nodes in a tree:
“Synthesize” propertiesdefine/calculate prop’s bottom-up
“Inherit” propertiesdefine/calculate prop’s top-down
• allows both at the same time
Attribute grammarCFG + attributes one grammar symbols + rules specifing for eachproduction, how to determine attributes
• evaluation of attributes: requires some thought, more complexif mixing bottom-up + top-down dependencies
26attributes in AG’s: static, obviously.235 / 672
![Page 236: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/236.jpg)
Example: evaluation of numerical expressions
Expression grammar (similar as seen before)
exp → exp + term ∣ exp − term ∣ termterm → term ∗ factor ∣ factor
factor → ( exp ) ∣ number
• goal now: evaluate a given expression, i.e., the syntax tree ofan expression, resp:
more concrete goalSpecify, in terms of the grammar, how expressions are evaluated
• grammar: describes the “format” or “shape” of (syntax) trees• syntax-directedness• value of (sub-)expressions: attribute here2727stated earlier: values of syntactic entities are generally dynamic attributes
and cannot therefore be treated by an AG. In this AG example it’s staticallydoable (because no variables, no state-change etc).
236 / 672
![Page 237: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/237.jpg)
Expression evaluation: how to do if on one’s own?
• simple problem, easy solvable without having heard of AGs• given an expression, in the form of a syntax tree• evaluation:
• simple bottom-up calculation of values• the value of a compound expression (parent node) determined
by the value of its subnodes• realizable, for example by a simple recursive procedure28
Connection to AG’s• AGs: basically a formalism to specify things like that• however: general AGs will allow more complex calculations:
• not just bottom up calculations like here but also• top-down, including both at the same timea
atop-down calculation will not be needed for the simple expression evaluationexample.
28resp. a number of mutually recursive procedures, one for factors, one forterms etc. See next slide
237 / 672
![Page 238: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/238.jpg)
Pseudo code for evaluation
eva l_exp ( e ) =case: : e e qu a l s PLUSnode −>
re tu rn eva l_exp ( e . l e f t ) + eval_term ( e . r i g h t ): : e e qu a l s MINUSnode −>
re tu rn eva l_exp ( e . l e f t ) − eval_term ( e . r i g h t ). . .end case
238 / 672
![Page 239: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/239.jpg)
AG for expression evaluation
productions/grammar rules semantic rules1 exp1 → exp2 + term exp1.val = exp2.val + term.val2 exp1 → exp2 − term exp1.val = exp2.val − term.val3 exp → term exp.val = term.val4 term1 → term2 ∗ factor term1.val = term2.val ∗ factor .val5 term → factor term.val = factor .val6 factor → ( exp ) factor .val = exp.val7 factor → number factor .val = number.val
• specific for this example• only one attribute (for all nodes), in general: different ones
possible• (related to that): only one semantic rule per production• as mentioned: rules here define values of attributes
“bottom-up” only
• note: subscripts on the symbols for disambiguation (whereneeded)
239 / 672
![Page 240: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/240.jpg)
Attributed parse tree
240 / 672
![Page 241: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/241.jpg)
First observations concerning the example AG
• attributes• defined per grammar symbol (mainly non-terminals), but• get they values “per node”• notation exp.val• if one wants to be precise: val is an attribute of non-terminal
exp (among others), val in an expression-node in the tree isan instance of that attribute
• instance not= the value!
241 / 672
![Page 242: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/242.jpg)
Semantic rules
• aka: attribution rule• fix for each symbol X : set of attributes29
• attribute: intended as “fields” in the nodes of syntax trees• notation: X .a: attribute a of symbol X• but: attribute obtain values not per symbol, but per node in atree (per instance)
Semantic rule for production X0 → X1 . . .Xn
Xi .aj = fij(X0.a1, . . . ,X0.ak ,X1.a1, . . .X1.ak , . . . ,Xn.a1, . . . ,Xn.ak)
• Xi on the left-hand side: not necessarily head symbol of theproduction X0
• evaluation example: more restricted (making example simple)29different symbols may share same attribute with the same name. Those
may have different types but the type of an attribute per symbol is uniform.Cf. fields in classes (and objects).
242 / 672
![Page 243: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/243.jpg)
Subtle point (forgotten by Louden): terminals
• terminals: can have attributes, yes,• but looking carefully at the format of semantic rules: not reallyspecified how terminals get values to their attribute (apartfrom inheriting them)
• dependencies for terminals• attribues of terminals: get value from the token, especially the
token value• terminal nodes: commonly not allowed to depend on parents,
siblings.
• i.e., commonly: only attributes “synthesized” from thecorresponding token allowed.
• note: without allowing “importing” values from the numbertoken to the number.val-attributes, the evaluation examplewould not work
243 / 672
![Page 244: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/244.jpg)
Attribute dependencies and dependence graph
Xi .aj = fij(X0.a1, . . . ,X0.ak ,X1.a1, . . .X1.ak , . . . ,Xn.a1, . . . ,Xn.ak)
• sem. rule: expresses dependence of attribute Xi .aj on the lefton all attributes Y .b on the right
• dependence of Xi .aj• in principle, Xi .aj : may depend on all attributes for all Xk of
the production• but typically: dependent only on a subset
Possible dependencies (> 1 rule per production possible)• parent attribute on childen attributes• attribute in a node dependent on other attribute of the node• child attribute on parent attribute• sibling attribute on sibling attribute• mixture of all of the above at the same time• but: no immediate dependence across generations
244 / 672
![Page 245: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/245.jpg)
Attribute dependence graph
• dependencies ultimate between attrributes in a syntax tree(instances) not between grammar symbols as such
⇒ attribute dependence graph (per syntax tree)• complex dependencies possible:
• evaluation complex• invalid dependencies possible, if not careful (especially cyclic)
245 / 672
![Page 246: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/246.jpg)
Sample dependence graph (for later example)
246 / 672
![Page 247: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/247.jpg)
Possible evaluation order
247 / 672
![Page 248: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/248.jpg)
Restricting dependencies
• general GAs allow bascially any kind of depenencies30
• complex/impossible to meaningfully evaluate (or understand)• typically: restrictions, disallowing “mixtures” of dependencies
• fine-grained: per attribute• or coarse-grained: for the whole attribute grammar
Synthesized attributesbottom-up dependencies only(same-node dependency allowed).
Inherited attributestop-down dependencies only(same-node and siblingdependencies allowed)
30apart from immediate cross-generation.248 / 672
![Page 249: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/249.jpg)
Synthesized attributes
Synthesized attributeA synthetic attribute is define wholly in terms of the node’s ownattributes, and those of its children (or constants).
Rule format for synth. attributesFor a synthesized attribute s of non-terminal A, all semantic ruleswith A.s on the left-hand side must be of the form
A.s = f (A.a,X1.b1, . . .Xn.bk)
and where the semantic rule belongs to production A→ X1 . . .Xn
• Slight simplification in the formula: only 1 attribute persymbol. In general, instead depend on A.a only , dependencieson A.a1, . . .A.al possible. Similarly for the rest of the formula
S-attributed grammar:all attributes are synthetic 249 / 672
![Page 250: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/250.jpg)
Remarks on the definition of synthesized attributes
• Note the following aspects1. a synthesized attribute in a symbol: cannot at the same time
also be “inherited”.2. a synthesized attribute:
• depends on attributes of children (and other attributes of thesame node) only. However:
• those attributes need not themselves be synthesized (see alsonext slide)
• in Louden:• he does not allow “intra-node” dependencies• he assumes (in his wordings): attributes are “globally unique”
250 / 672
![Page 251: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/251.jpg)
Alternative, more complex variant
“Transitive” definition
A.s = f (A.i1, . . . ,A.im,X1.s1, . . .Xn.sk)
• in the rule: the Xi .sj ’s synthesized, the Ai .ij ’s inherited• interpret the rule carefully: it says:
• it’s allowed to have synthesized & inherited attributes for A• it does not say: attributes in A have to be inherited (the Xi ’s
can be A as well)• it says: in A-node in the tree: a synthesized attribute
• can depend on inherited att’s in the same node and• on synthesized A-attributes of A-children-nodes
251 / 672
![Page 252: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/252.jpg)
Pictorial representation
Conventional depictionGeneral synthesized attributes
252 / 672
![Page 253: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/253.jpg)
Inherited attributes
Inherited attributeAn inherited attribute is defined wholly in terms of the node’s ownattributes, and those of its siblings or its parent node (or constants).
Rule format for inh. attributesFor an inherited attribute of a symbol X , all semantic rulesmentioning X .i on the left-hand side must be of the form
X .i = f (A.a,X1.b1, . . . ,X , . . .Xn.bk)
and where the semantic rule belongs to productionA→ X1 . . .X , . . .Xn
253 / 672
![Page 254: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/254.jpg)
Alternative definition
Rule formatFor an inherited attribute of a symbol X , all semantic rulesmentioning A.i on the left-hand side must be of the form
X .i = f (A.i′,X1.b1, . . . ,X , . . .Xn.bk)
and where the semantic rule belongs to productionA→ X1 . . .X , . . .Xn
• additional requirement: A.i′ inherited• rest of the attributes: inherited or synthesized
254 / 672
![Page 255: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/255.jpg)
Simplistic example (normally done by the scanner)
• not only done by the scanner, but relying on built-in functionof the implementing programming language. . .
CFG
number → numberdigit ∣ digitdigit → 0 ∣ 1 ∣ 2 ∣ 3 ∣ 4 ∣ 5 ∣ 6 ∣ 7 ∣ 8 ∣ 9 ∣
Attributes (just synthesized)
number valdigit valterminals none
255 / 672
![Page 256: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/256.jpg)
Numbers: Attribute grammar and attributed tree
A-grammar
A-tree
256 / 672
![Page 257: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/257.jpg)
Attribute evaluation: works on trees
i.e.: works equally well for• abstract syntax trees• ambiguous grammars
Seriously ambiguous expression grammara
aalternatively: grammar describing nice and cleans ASTs for an underlying,potentially less nice grammar used for parsing.
exp → exp + exp ∣ exp − exp ∣ exp ∗ exp ∣ ( exp ) ∣ number
257 / 672
![Page 258: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/258.jpg)
Evaluation: Attribute grammar and attributed tree
A-grammarA-tree
258 / 672
![Page 259: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/259.jpg)
Expressions: generating ASTs
Expression grammar with precedences & assoc.
exp → exp + term ∣ exp − term ∣ termterm → term ∗ factor ∣ factor
factor → ( exp ) ∣ number
Attributes (just synthesized)
exp, term, factor treenumber lexval
259 / 672
![Page 260: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/260.jpg)
Expressions: Attribute grammar and attributed tree
A-grammar
A-tree
260 / 672
![Page 261: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/261.jpg)
Example: type declarations for variable lists
CFG
decl → typevar -listtype → inttype → float
var -list1 → id ,var -list2var -list → id
• Goal: attribute type information to the syntax tree• attribute: dtype (with values integer and real)31
• complication: “top-down” information flow: type declared for alist of vars ⇒ inherited to the elements of the list
31There are thus 2 different values. We don’t mean “the attribute dtype hasinteger values”, like 0,1,2, . . .
261 / 672
![Page 262: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/262.jpg)
Types and variable lists: inherited attributes
grammar productions semantic rulesdecl → typevar -list var -list.dtype = type.dtypetype → int type.dtype = integertype → float type.dtype = real
var -list1 → id ,var -list2 id .dtype = var -list1.dtypevar -list2.dtype = var -list1.dtype
var -list → id id .dtype = var -list.dtype
• inherited: attribute for id and var -list• but also synthesized use of attribute dtype: for type.dtype32
32Actually, it’s conceptually better not to think of it as “the attribute dtype”,it’s better as “the attribute dtype of non-terminal type” (written type.dtype)etc. Note further: type.dtype is not yet what we called instance of anattribute.
262 / 672
![Page 263: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/263.jpg)
Types & var lists: after evaluating the semantic rules
floatid(x),id(y)
Attributed parse tree Dependence graph
263 / 672
![Page 264: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/264.jpg)
Example: Based numbers (octal & decimal)
• remember: grammar for numbers (in decimal notation)• evaluation: synthesized attributes• now: generalization to numbers with decimal and octalnotation
CFG
based -num → numbase-charbase-char → obase-char → d
num → numdigitnum → digitdigit → 0digit → 1
. . .digit → 7digit → 8digit → 9
264 / 672
![Page 265: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/265.jpg)
Based numbers: attributes
Attributes• based -num.val: synthesized• base-char .base: synthesized• for num:
• num.val: synthesized• num.base: inherited
• digit.val: synthesized
• 9 is not an octal character⇒ attribute val may get value “error ”!
265 / 672
![Page 266: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/266.jpg)
Based numbers: a-grammar
266 / 672
![Page 267: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/267.jpg)
Based numbers: after eval of the semantic rules
Attributed syntax tree
267 / 672
![Page 268: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/268.jpg)
Based nums: Dependence graph & possible evaluation order
268 / 672
![Page 269: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/269.jpg)
Dependence graph & evaluation
• evaluation order must respect the edges in the dependencegraph
• cycles must be avoided!• directed acyclic graph (DAG)33
• dependence graph ∼ partial order• topological sorting: turning a partial order to a total/linearorder (which is consistent with the PO)
• roots in the dependence graph (not the root of the syntaxtree): they value must come “from outside” (or constant)
• often (and sometimes required): terminals in the syntax tree:• terminals synthesized / not inherited⇒ terminals: roots of dependence graph⇒ get their value from the parser (token value)
33it’s not a tree. It may have more than one “root” (like a forest). Also:“shared descendents” are allows. But no cycles.
269 / 672
![Page 270: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/270.jpg)
Evaluation: parse tree method
For acyclic dependence graphs: possible “naive” approach
Parse tree methodLinearize the given partial order into a total order (topologicalsorting), and then simply evaluate the equations following that.
• works only if all dependence graphs of the AG are acyclic• acyclicity of the dependence graphs?
• decidable for given AG, but computationally expensive34
• don’t use general AGs but: restrict yourself to subclasses
• disadvantage of parse tree method: also not very efficientcheck per parse tree
34On the other hand: needs to be one only once.270 / 672
![Page 271: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/271.jpg)
Observation on the example: Is evalution (uniquely)possible?
• all attributes: either inherited or synthesized35
• all attribute: must actually be defined (by some rule)• guaranteed in that for every procuction:
• all synthesized attributes (on the left) are defined• all inherited attributes (on the right) are defined• local loops forbidden
• since all attributes are either inherited or synthesized: eachattribute in any parse tree: defined, and defined only one time(i.e., uniquely defined)
35base-char .base (synthesized) considered different from num.base(inherited)
271 / 672
![Page 272: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/272.jpg)
Loops
• a-grammars: allow to specify grammars where (some)parse-trees have cycles.
• however: loops intolerable for evaluation• difficult to check (exponential complexity).36
36acyclicity checking for a given dependence graph: not so hard (e.g., usingtopological sorting). Here: for all syntax trees.
272 / 672
![Page 273: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/273.jpg)
Variable lists (repeated)
Attributed parse tree Dependence graph
273 / 672
![Page 274: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/274.jpg)
Typing for variable lists
• code assume: tree given37
37reasonable, if AST. For parse-tree, the attribution of types must deal withthe fact that the parse tree is being built during parsing. It also means: “blur”border between context-free and context-sensitive analysis
274 / 672
![Page 275: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/275.jpg)
L-attributed grammars
• goal: attribute grammar suitable for “on-the-fly” attribution
Definition (L-attributed grammar)An attribute grammar for attributes a1, . . . ,ak is L-attributed, if foreach inherited attribute aj and each grammar rule
X0 → X1X2 . . .X ,
the associatied equations for aj are all of the form
Xi .aj = fij(X0.a⃗,X1.a⃗ . . .Xi−1.a⃗) .
where additionally for X0.a⃗, only inherited attributes are allowed.
• X⃗ .a: short-hand for X .a1 . . .X .ak• Note S-attributed grammar ⇒ L-attributed grammar
Discussion
275 / 672
![Page 276: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/276.jpg)
“Attribuation” and LR-parsing
• easy (and typical) case: synthesized attributes• for inherited attributes
• not quite so easy• perhaps better: not “on-the-fly”• i.e., better postponed for later phase, when AST available.
• implementation: additional value stack for synthesizedattributes, maintained “besides” the parse stack
276 / 672
![Page 277: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/277.jpg)
Example of a value stack for synthesized attributes
Sample action
E : E + E { $$ = $1 + $3 ; }
in (classic) yacc notation
Value stack manipulation
277 / 672
![Page 278: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/278.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
278 / 672
![Page 279: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/279.jpg)
Signed binary numbers (SBN)
SBN grammar
number → signlistsign → + ∣ −
list → listbit ∣ bitbit → 0 ∣ 1
Intended attributes
symbol attributesnumber valuesign negativelist position,valuebit position,value
• here: attributes for non-terminals (in general: terminals canalso be included)
279 / 672
![Page 280: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/280.jpg)
Attribute grammar SBN
production attribution rules1 number → signlist list.position = 0
if sign.negativethen number .value = −LIST .valueelse number .value = LIST .value
2 sign → + sign.negative = false3 sign → − sign.negative = true4 list → bit bit.position = list.position
list.value = bit.value5 list0 → list1bit list1.position = list0.position + 1
bit.position = list0.positionlist0.position = list1.value + bit.value
6 bit → 0 bit.value = 07 bit → 1 bit.value = 2bit.position
280 / 672
![Page 281: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/281.jpg)
INF5110 – Compiler Construction
Symbol tables
2. 02. 2016
281 / 672
![Page 282: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/282.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
282 / 672
![Page 283: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/283.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
283 / 672
![Page 284: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/284.jpg)
Symbol tables, in general
• central data structure• “data base” or repository associating properties with “names”(identifiers, symbols)
• declarations• constants• type declarationss• variable declarations• procedure-declarations• class declarations• . . .
• declaring occurrences vs. use occurrences of names (e.g.variables)
284 / 672
![Page 285: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/285.jpg)
Does my compiler need a symbol table?
• goal: associate attributes (properties) to syntactic elements(names/symbols)
• storing once calculated: (costs memory) ↔ recalculating ondemand (costs time)
• most often: storing prefered• but: can’t one store it in the nodes of the AST?
• remember: attribute grammar• however, fancy attribute grammars with many rules and
complex synthesized/inherited attribute (whose evaluationtraverses up and down and across the tree):
• might be intransparent• storing info in the tree: might not be efficient
⇒ central repository (= symbol table) better.
So: do I need a symbol table?In theory, alternatives exists; in practice, yes, symbol tables needed;most compilers do use symbol tables.
285 / 672
![Page 286: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/286.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
286 / 672
![Page 287: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/287.jpg)
Symbol table as abstract date type
• separate interface from implementation• ST: basically nothing else than a lookup-table or dictionary,• associating “keys” with “values”• here: keys = names (id’s, symbols), values the attribute(s)
Schematic interface: two core functions (+ more)• insert; add new binding• lookup: retrieve
besides the core functionality:• structure of (different?) name spaces in the implementedlanguage, scoping rules
• typically: not one single “flat” namespace ⇒ typically not onebig flat look-up table38
• ⇒ influence on the design/interface of the ST (and indirectlythe choice of implementation)
• necessary to “delete” or “hide” information (delete)38Neither conceptually nor the way it’s implemented. 287 / 672
![Page 288: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/288.jpg)
Two main philosophies
Traditional table(s)• central repository, separatefrom AST
• interface• lookup(name),• insert(name,decl),• delete(name)
• last 2: update ST fordeclarations and whenentering/exiting blocks
declarations in the AST nodes• to look-up ⇒ tree- search• insert/delete: implicit,depending on relativepositioning in the tree
• look-up:• potential lack of efficiency• however: optimizations
exist, e.g. “redundant”extra table (similar to thetraditional ST)
Here, for concreteness, (declarations/ are the attributes stored inthe ST. It’s not the only possible attribute stored. There may alsobe more than one ST.
288 / 672
![Page 289: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/289.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
289 / 672
![Page 290: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/290.jpg)
Data structures to implement a symbol table
• different ways to implement dictionaries (or look-up tables etc)
• simple (association) lists• trees
• balanced (AVL, B, red-black, binary-search trees)• hash tables, often method of choice• functional vs. imperative implementation
• careful choice influences efficiency• influenced also by the language being implemented,• in particular, its scoping rules (or the structure of the namespace in general) etc.39
39Also the language used for implementation (and the availability of librariestherein) may play a role (but remember “bootstrapping”)
290 / 672
![Page 291: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/291.jpg)
Nested block / lexical scope
for instance: C
{ i n t i ; . . . ; double d ;vo id p ( . . . ) ;{
i n t i ;. . .
}i n t j ;. . .
more later
291 / 672
![Page 292: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/292.jpg)
Blocks in other languages
TEX
\def \x{a}{
\def \x{b}\x
}\x\bye
LATEX
\ documentc l a s s { a r t i c l e }\newcommand{\x}{a}\begin {document}\x{\renewcommand{\x}{b}
\x}\end{document}
But: static vs. dynamic binding (see later)
292 / 672
![Page 293: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/293.jpg)
Hash tables
• classical and common implementation for STs• “hash table”:
• generic term itself, different general forms of HTs exists• e.g. separate chaining vs. open addressing40
Separate chaining
Code snippet
{i n t temp ;i n t j ;r e a l i ;vo id s i z e ( . . . . ) {
{. . . .
}}
}
40There is alternative terminology (cf. INF2220), under which separatechaining is also known as open hashing. The open addressing methods are alsocalled closed hashing. That’s how it is. 293 / 672
![Page 294: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/294.jpg)
Block structures in programming languages
• almost no language has one global namespace (at least not forvariables)
• pretty old concept, seriously started with ALGOL60.
block• “region” in the program code• delimited often by { and } or BEGIN and END• used to organize the scope of declarations (i.e., the namespace)
• nested blocks
294 / 672
![Page 295: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/295.jpg)
Block-structured scopes (in C)
i n t i , j ;
i n t f ( i n t s i z e ){ char i , temp ;
. . .{ double j ;
. .}. . .{ char ∗ j ;
. . .}
}
295 / 672
![Page 296: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/296.jpg)
Nested procedures in Pascal
program Ex ;va r i , j : i n t e g e r
f u n c t i o n f ( s i z e : i n t e g e r ) : i n t e g e r ;va r i , temp : char ;
procedure g ;va r j ; r e a l ;beg in
. . .end ;procedure h ;va r j : ^char ;beg in
. . .end ;
beg in (∗ f ’ s body ∗). . .
end ;beg in (∗ main program ∗)
. . .end . 296 / 672
![Page 297: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/297.jpg)
Block-strucured via stack-organized separate chaining
C code snippet
i n t i , j ;
i n t f ( i n t s i z e ){ char i , temp ;
. . .{ doub le j ;
. .}. . .{ char ∗ j ;
. . .}
}
“Evolution” of the hash table
297 / 672
![Page 298: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/298.jpg)
Using the syntax tree for lookup
l ookup ( s t r i n g n ) {k = naavae rende b lokkdo // l e t e t t e r n i d e c l t i l b l okk k ;k = k . s lu n t i l f unne t e l l e r k == none
}
298 / 672
![Page 299: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/299.jpg)
Alternative representation:
• arrangement different from 1 table with stack-organizedexternal chaining
• each block with one own hash table.41
• standard hashing within each block• static links to link the block levels⇒ “tree-of-hashtables”• AKA: sheaf-of-tables or chained symbol tables representation
41One may say: one symbol table per block, because the form oforganization can generally be done for symbol tables data structures (wherehash tables is just one of possible implementing data structure).
299 / 672
![Page 300: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/300.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
300 / 672
![Page 301: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/301.jpg)
Block-structured scoping with chained symbol tables
• remember the interface• look-up: following the static link (as seen)42
• Enter a block• create new (empty) symbol table• set static link from there to the “old” (= previously current)
one• set the current block to the newly created one
• at exit• move the current block one level up• note: no deletion of bindings, just made inaccessible
42The notion of static links will be encountered later again when dealing withrun-time environments (and for analogous purposes: identfying scopes in“block-stuctured” languages).
301 / 672
![Page 302: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/302.jpg)
Lexical scoping & beyond
• block-structured lexical scoping: central in programminglanguages (ever since ALGOL60 . . . )
• but: other scoping mechanism exists (and exist side-by-side)• example: C++
• member functions declared inside a class• defined outside
• still: method supposed to be able to access names defined inthe scope of the class definition (i.e., other members, e.g.using this)
C++class and member function
c l a s s A {. . . i n t f ( ) ; . . . // member f u n c t i o n
}
A : : f ( ) {} // de f . o f f ‘ ‘ i n ’ ’ A
Java analogon
c l a s s A {i n t f ( ) { . . . } ;boolean b ;vo id h ( ) { . . . } ;
}
302 / 672
![Page 303: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/303.jpg)
Scope resolution in C++
• class name introduces a name for the scope43 (not only in C++)• scope resolution operator ::• allows to explicitly refer to a “scope” ’
• to implement• such flexibility,• also for remote access like a.f()
• declarations must be kept separatly for each block (e.g. onehash table per class, record, etc., appropriately chained up)
43Besides that, class names are subject to scoping themselves, of course.303 / 672
![Page 304: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/304.jpg)
Same-level declarations
Same level
typedef i n t ii n t i ;
• often forbidden (for instancein C)
• insert: requires check (=lookup) first
Sequential vs. “collaterals declarations
i n t i = 1 ;vo id f ( vo id )
{ i n t i = 2 , j = i +1,. . .
}
l e t i = 1 ; ;l e t i = 2 and y = i +1; ;
p r i n t_ i n t ( y ) ; ;
304 / 672
![Page 305: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/305.jpg)
Recursive declarations/definitions
• for instance for functions/procedures• also classes and their members
Direct recursion
i n t gcd ( i n t n , i n t m) {i f (m == 0) re tu rn n ;e l s e re tu rn gcd (m, n % m) ;
}
• before treating the body,parser must add gcd into thesymbol table.
Indirect recursion/mutualrecursive def’s
vo id f ( vo id ) {. . . g ( ) . . . }
vo id g ( vo id ) {. . . f ( ) . . . }
305 / 672
![Page 306: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/306.jpg)
Mutual recursive defintions
vo id g ( vo id ) ; /∗ f u n c t i o n p r o t o t yp e d e c l . ∗/
vo id f ( vo id ) {. . . g ( ) . . . }
vo id g ( vo id ) {. . . f ( ) . . . }
• different solutions possible• Pascal: forward declarations• or: treat all function definitions (within a block or similar) asmutually recursive
• or: special grouping syntax
ocaml
l e t rec f ( x : i n t ) : i n t =g ( x+1)
and g ( x : i n t ) : i n t =f ( x +1) ; ;
Go
func f ( x i n t ) ( i n t ) {re tu rn g ( x ) +1
}
func g ( x i n t ) ( i n t ) {re tu rn f ( x ) −1
}306 / 672
![Page 307: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/307.jpg)
Static vs dynamic scope
• concentration so far:• lexical scoping/block structure, static binding• some minor complications/adaptations (recursion, duplicate
declarations, . . . )
• big variation: dynamic binding / dynamic scope• for variables: static binding/ lexical scoping the norm• however: cf. late-bound methods in OO
307 / 672
![Page 308: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/308.jpg)
Static scoping in C
Code snippet
#inc lude <s t d i o . h>
i n t i = 1 ;vo id f ( vo id ) {
p r i n t f ( "%d\n" , i ) ;}
vo id main ( vo id ) {i n t i = 2 ;f ( ) ;re tu rn 0 ;
}
• which value of i is printed then?
308 / 672
![Page 309: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/309.jpg)
Dynamic binding example
1 vo id Y ( ) {2 i n t i ;3 vo id P( ) {4 i n t i ;5 . . . ;6 Q( ) ;7 }8 vo id Q(){9 . . . ;
10 i = 5 ; // which i i s meant?11 }12 . . . ;1314 P ( ) ;15 . . . ;16 }
309 / 672
![Page 310: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/310.jpg)
Dynamic binding example
1 vo id Y ( ) {2 i n t i ;3 vo id P( ) {4 i n t i ;5 . . . ;6 Q( ) ;7 }8 vo id Q(){9 . . . ;
10 i = 5 ; // which i i s meant?11 }12 . . . ;1314 P ( ) ;15 . . . ;16 }
for dynamic binding: the one from line 4
310 / 672
![Page 311: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/311.jpg)
Static or dynamic?
TEX
\def \ a s t r i n g {a1}\def \x{\ a s t r i n g }\x{
\def \ a s t r i n g {a2}\x
}\x\bye
LATEX
\ documentc l a s s { a r t i c l e }\newcommand{\ a s t r i n g }{a1}\newcommand{\x}{\ a s t r i n g }\begin {document}\x{
\renewcommand{\ a s t r i n g }{a2}\x
}\x\end{document}
emacs lisp (/= Scheme)
( s e t q a s t r i n g "a1" )( defun x ( ) a s t r i n g )( x )( l e t ( ( a s t r i n g "a2" ) )
( x ) )
311 / 672
![Page 312: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/312.jpg)
Static binding is not about “value”
• the “static” in static binding is about• binding to the declaration / memory location,• not about the value
• nested functions used in the example (Go)• g declared inside f
package mainimport ( " fmt" )
var f = func ( ) {var x = 0var g = func ( ) { fmt . P r i n t f ( "␣x␣=␣%v" , x )}x = x + 1
{var x = 40 // l o c a l v a r i a b l eg ( )fmt . P r i n t f ( "␣x␣=␣%v" , x )}
}func main ( ) {
f ( )}
312 / 672
![Page 313: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/313.jpg)
Static binding can be come tricky
package mainimport ( " fmt" )
var f = func ( ) ( func ( i n t ) i n t ) {var x = 40 // l o c a l v a r i a b l evar g = func ( y i n t ) i n t { // ne s t ed f u n c t i o n
re tu rn x + 1}x = x+1 // update xre tu rn g // f u n c t i o n as r e t u r n v a l u e
}
func main ( ) {var x = 0var h = f ( )fmt . P r i n t l n ( x )var r = h (1)fmt . P r i n t f ( "␣ r ␣=␣%v" , r )
}
• example uses higher-order functions
313 / 672
![Page 314: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/314.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
314 / 672
![Page 315: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/315.jpg)
Expressions and declarations: grammar
Nested lets in ocaml
l e t x = 2 and y = 3 i n( l e t x = x+2 and y =
( l e t z = 4 i n x+y+z )i n ( x+y ) )
• simple grammar (using , for “collateral” declarations)
S → expexp → ( exp ) ∣ exp + exp ∣ id ∣ num ∣ let dec-list in exp
dec-list → dec-list, decl ∣ decldecl → id = exp
315 / 672
![Page 316: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/316.jpg)
Informal rules governing declarations
1. no identical names in the same let-block2. used names must be declared3. most-closely nested binding counts4. sequential (non-simultaneous) declaration (/= ocaml/ML)
l e t x = 2 , x = 3 i n x + 1 (∗ no , d u p l i c a t e ∗)
l e t x = 2 i n x+y (∗ no , y unbound ∗)
l e t x = 2 i n ( l e t x = 3 i n x ) (∗ d e c l . w i th 3 count s ∗)
l e t x = 2 , y = x+1 (∗ one a f t e r the o th e r ∗)i n ( l e t x = x+y ,
y = x+yi n y )
GoalDesign an attribute grammar (using a symbol table) specifyingthose rules. Focus on: error attribute.
316 / 672
![Page 317: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/317.jpg)
Attributes and ST interface
symbol attributes kindexp symtab inherited
nestlevel inheritederr synthesis
dec-list,decl intab inheritedouttab synthesizednestlevel inherited
id name injected by scanner
Symbol table functions• insert(tab,name,lev): returns a new table• isin(tab,name): boolean check• lookup(tab,name): gives back levela
• emptytable: you have to start somewhere• errtab: error from declaration (but not stored as attribute)aRealistically, more info would be stored as well (types etc) 317 / 672
![Page 318: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/318.jpg)
Attribute grammar (1): expressions
• note: expression in let’s can introduce scope themselves!• interpretation of nesting level: expressions vs. declarations4444I would not have recommended doing it like that (though it works)
318 / 672
![Page 319: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/319.jpg)
Attribute grammar (2): declarations
319 / 672
![Page 320: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/320.jpg)
Final remarks concerning symbol tables
• strings as symbols i.e., as keys in the ST: might be improved• name spaces can get complex in modern languages,• more than one “hierarchy”
• lexical blocks• inheritance or similar• (nested) modules
• not all bindings (of course) can be solved at compile time:dynamic binding
• can e.g. variables and types have same name (and still bedistinguished)
• overloading (see next slide)
320 / 672
![Page 321: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/321.jpg)
Final remarks: name resolution via overloading
• corresponds to “in abuse of notation” in textbooks• disambiguation not by context, but differently by “contexts”,“argument types” etc.
• variants :• method or function overloading• operator overloading• user defined?
i + j // i n t e g e r a d d i t i o nr + s // r e a l − a d d i t i o n
vo id f ( i n t i )vo id f ( i n t i , i n t j )vo id f ( double r )
321 / 672
![Page 322: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/322.jpg)
INF5110 – Compiler Construction
Types and type checking
2. 02. 2016
322 / 672
![Page 323: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/323.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
323 / 672
![Page 324: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/324.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
324 / 672
![Page 325: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/325.jpg)
General remarks and overview
• Goal here:• what are types?• static vs. dynamic typing• how to describe types syntactically• how to represent and use types in a compiler
• coverage of various types• basic types (often predefined/built-in)• type constructors• values of a type and operators• representation at run-time• run-time tests and special problems (array, union, record,
pointers)
• specification and implementation of type systems/typecheckers
• advanced concepts
325 / 672
![Page 326: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/326.jpg)
Why types?
• crucial user-visible abstraction describing program behavior.
• one view: type describes a set of (mostly related) values• static typing: checking/enforcing a type discipline at compiletime
• dynamic typing: same at run-time, mixtures possible• completely untyped languages: very rare, types were part ofPLs from the start.
Milner’s dictum (“type safety”)Well-typed programs cannot go wrong!
• strong typing:45 rigourously prevent “misuse” of data• types useful for later phases and optimizations• documentation and partial specification45Terminology rather fuzzy, and perhaps changed a bit over time.
326 / 672
![Page 327: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/327.jpg)
Types: in first approximation
Conceptually• semantic view: A set of values plus a set of correspondingoperations
• syntactiv view: notation to construct basic elements of thetype (it’s values) plus “procedures” operating on them
• compiler implementor’s view: data of the same type have sameunderlying memory representation
further classification:
• built-in/predefined vs. user-defined types• basic/base/elementary/primitive types vs. compound types• type constructors: building more compex types from simplerones
• reference vs. value types
327 / 672
![Page 328: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/328.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
328 / 672
![Page 329: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/329.jpg)
Some typical base types
base typesint 0, 1, . . . +,−,∗, / integersreal 5.05E4 . . . +,-,* real numbersbool true, false and or (|) . . . booleanschar ’a’ characters⋮
• often HW support for some of those (including many of theop’s)
• mostly: elements of int are not exactly mathematicalintegers, same for real
• often variations offered: int32, int64• often implicit conversions and relations between basic types
• which the type system has to specify/check for legality• which the compiler has to implement
329 / 672
![Page 330: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/330.jpg)
Some compound types
composed typesarray[0..9] of real a[i+1]list [], [1;2;3] concatstring "text" concat . . .struct / record r.x. . .
• mostly reference types• when built in, special “easy syntax” (same for basic built-intypes)
• 4 + 5 as opposed to plus(4,5)• a[6] as opposed to array_access(a, 6) . . .
• parser/lexer aware of built-in types/operators (specialprecedences, associativity etc)
• cf. functionality “built-in/predefined” via libraries
330 / 672
![Page 331: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/331.jpg)
Abstract data types
• unit of data together with functions/procedures/operations . . .operating on them
• encapsulation + interface• often: separation between exported and interal operations
• for instance public, private . . .• or via separate interfaces
• (static) classes in Java: may be used/seen as ADTs, methodsare then the “operations”
ADT begini n t ege r i ;r e a l x ;i n t proc t o t a l ( i n t a ) {
re tu rn i ∗ x + a // or : ‘ ‘ t o t a l = i ∗ x + a ’ ’}
end
331 / 672
![Page 332: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/332.jpg)
Type constructors: building new types
• array type• record type (also known as struct-types• union type• pair/tuple type• pointer type
• explict as in C• implict distinction between reference and value types, hidden
from programmer (e.g. Java)
• signatures (specifyingmethods/procedures/subroutines/functions) as type
• function type constructor, incl. higher-order types (infunctional languages)
• (names of) classes and subclasses• . . .
332 / 672
![Page 333: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/333.jpg)
Arrays
Array type
ar ray [< index t ype >] of <component type>
• elements (arrays) = (finite) functions from index-type tocomponent type
• allowed index-types:• non-negative (unsigned) integers?, from ... to ...?• other types?: enumerated types, characters
• things to keep in mind:• indexing outside the array bounds?• are the array bounds (statically) known to the compiler?• dynamic arrays (extensible at run-time)?
333 / 672
![Page 334: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/334.jpg)
One and more-dimensional arrays
• one-dimensional: effienctly implementable in standardhardware, (relative memory addressing, known offset)
• two or more dimensions
a r r a y [ 1 . . 4 ] of a r r a y [ 1 . . 3 ] of r e a la r r a y [ 1 . . 4 , 1 . . 3 ] of r e a l
• one can see it as “array of arrays” (Java), an array is typically areference type
• conceptually “two-dimensional”• linear layout in memory (dependent on the language)
334 / 672
![Page 335: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/335.jpg)
Records (“structs”)
s t r u c t {r e a l r ;i n t i ;
}
• values: “labelled tuples” (real× int)• constructing elements, e.g.• access (read or update): dot-notation x.i• implemenation: linear memory layout given by the (types ofthe) attributes
• attributes accessible by statically-fixed offsets• fast access• cf. objects as in Java
335 / 672
![Page 336: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/336.jpg)
Tuple/product types
• T1 ×T2 (or in ascii T_1 * T_2)• elements are tuples: for instance: (1, "text") is element ofint * string
• generalization to n-tuples:
value type(1, "text", true) int * string * bool(1, ("text", true)) int * (string * bool)
• structs can be seen as “labeled tuples”, resp. tuples as“anonymous structs”
• tuple types: common in functional languages,• in C/Java-like languages: n-ary tuple types often only implicitas input types for procedures/methods (part of the “signature”)
336 / 672
![Page 337: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/337.jpg)
Union types (C-style again)
union {r e a l r ;i n t i
}
• related to sum types (outside C)• (more or less) represents disjoint union of values of“participating” types
• access in C (confusingly enough): dot-notation u.i
337 / 672
![Page 338: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/338.jpg)
Union types in C and type safety
• union types is C: bad example for (safe) type disciplines, as it’ssimply type-unsafe, basically an unsafe hack . . .
• the union type (in C):• nothing much more than directive to allocate enough memory
to hold largest member of the union.• in the above example: real takes more space than int
• role of type here is more: implementor’s (= low level) focusand memory allocation need, not “proper usage focus” orassuring strong typing
⇒ bad example of modern use of types• better (type-safe) implementations known since⇒ variant record, “tagged”/“discriminated” union ) or even
inductive data types46
•46Basically: it’s union types done right plus possibility of “recursion”.
338 / 672
![Page 339: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/339.jpg)
Variant records from Pascal
record case i s R e a l : boolean oft rue : ( r : r e a l ) ;f a l s e : ( i : i n t ege r ) ;
• “variant record”• non-overlapping memory layout47
• type-safety-wise: not really of an improvement• programmer responsible to set and check the “discriminator”self
record case boolean oft rue : ( r : r e a l ) ;f a l s e : ( i : i n t ege r ) ;
47Again, it’s a implementor-centric, not user-centric view339 / 672
![Page 340: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/340.jpg)
Pointer types
• pointer type: notation in C: int*• “ * ”: can be seen as type constructor
i n t ∗ p ;
• random other languages: ^integer in Pascal, int ref in ML• value: address of (or reference/pointer to) values of theunderlying type
• operations: dereferencing and determining the address of andata item (and C allows “pointer arithmetic”)
var a : ^ i n t ege rvar b : i n t ege r. . .a := &i (∗ i an i n t va r ∗)
(∗ a := new i n t e g e r ok too ∗)b:= ^a + b
340 / 672
![Page 341: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/341.jpg)
Implicit dereferencing
• many languages: more or less hide existence of pointers• cf. reference types vs. value types often: automatic/implicitdereferencing
C r ; //C r = new C ( ) ;
• “sloppy” speaking: “ r is an object (which is an instance ofclass C /which is of type C)”,
• slighly more recise: variable “ r contains an object. . . ”• precise: variable “ r will contain a reference to an object”• r.field corresponds to something like “ (*r).field, similarin Simula
• programming with pointers:• “popular” source of errors• test for non-null-ness often required• explicit pointers: can lead to problems in block-structured
language (when handled non-expertly)• watch out for parameter passing• aliasing 341 / 672
![Page 342: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/342.jpg)
Function variables
program Funcvar ;var pv : Procedure ( x : i n t ege r ) ;
Procedure Q( ) ;var
a : i n t ege r ;Procedure P( i : i n t ege r ) ;begin
a:= a+i ; (∗ a def ’ ed o u t s i d e ∗)end ;
beginpv := @P; (∗ ‘ ‘ r e t u rn ’ ’ P , ∗)
end ; (∗ "@" dependent on d i a l e c t ∗)begin
Q( ) ;pv ( 1 ) ;
end .
342 / 672
![Page 343: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/343.jpg)
Function variables and nested scopes
• tricky part here: nested scope + function definition escapingsurrounding function/scope.
• here: inner procedure “returned” via assignment to functionvariable48
• think about stack discipline of dynamic memory management?• related also: functions allowed as return value?
• Pascal: not directly possible (unless one “returns” them viafunction-typed reference variables like here)
• C: possible, but nested function definitions not allowed
• combination of nested function definitions and functions asofficial return values (and arguments): higher-order functions
• Note: functions as arguments less problematic than as returnvalues.
48Let’s for the sake of the lecture, not distinguish conceptually betweenfunctions and procedures. But in Pascal, a procedure does not return a value,functions do.
343 / 672
![Page 344: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/344.jpg)
Function signatures
• define the “header” (also “signature”) of a function49
• in the discussion: we don’t distinguish mostly: functions,procedures, methods, subroutines.
• functional type (independent of the name f ): int→int
Modula-2
var f : procedure ( i n t ege r ) : i n t ege r ;
C
i n t (∗ f ) ( i n t )
• values: all functions . . . with the given signature• problems with block structure and free use of procedurevariables.
49Actually, an identfier of the function is mentioned as well.344 / 672
![Page 345: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/345.jpg)
Escaping: function var’s outside the block structure
1 program Funcvar ;2 var pv : Procedure ( x : i n t ege r ) ;34 Procedure Q( ) ;5 var6 a : i n t ege r ;7 Procedure P( i : i n t ege r ) ;8 begin9 a:= a+i ; (∗ a def ’ ed o u t s i d e ∗)
10 end ;11 begin12 pv := @P; (∗ ‘ ‘ r e t u rn ’ ’ P , ∗)13 end ; (∗ "@" dependent on d i a l e c t ∗)14 begin15 Q( ) ;16 pv ( 1 ) ;17 end .
• at line 15: variable a no longer exists• possible safe usage: only assign to such variables (here pv) anew value (= function) at the same blocklevel the variable isdeclared
• note: function parameters less problematic (stack-disciplinestill doable)
345 / 672
![Page 346: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/346.jpg)
Classes and subclasses
Parent class
c l a s s A {i n t i ;vo id f ( ) { . . . }
}
Subclass B
c l a s s B extends A {i n t ivo id f ( ) { . . . }
}
Subclass C
c l a s s C extends A {i n t ivo id f ( ) { . . . }
}
• classes resemble records, and subclasses variant types, butadditionally
• local methods possble (besides fields)• subclasses• objects mostly created dynamically, no references into the stack• subtyping and polymorphism (subtype polymorphism): a
reference typed by A can also point to B or C objects
• special problem: not really many, nil-pointer still possible
346 / 672
![Page 347: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/347.jpg)
Access to object members: late binding
• notation rA.i or rA.f()• dynamic binding, late-binding, virtual access, virtual access,dynamic dispatch . . . : all mean roughly the same
• central mechanism in almost all OO language, in connectionwith inheritance
Virtual access rA.f() (methods)“deepest” f in the run-time class of the object, rA points to(independent from the static class type of rA.
• remember: “most-closely nested” access of variables in nestedlexical block
• Java:• methods “in” objects are only dynamically bound• instance variables not, neither static methods “in” classes.
347 / 672
![Page 348: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/348.jpg)
Example
pub l i c c l a s s Shadow {pub l i c s t a t i c vo id main ( S t r i n g [ ] a r g s ){
C2 c2 = new C2 ( ) ;c2 . n ( ) ;
}}
c l a s s C1 {S t r i n g s = "C1" ;vo id m () {System . out . p r i n t ( t h i s . s ) ; }
}
c l a s s C2 extends C1 {S t r i n g s = "C2" ;vo id n ( ) { t h i s .m( ) ; }
}
348 / 672
![Page 349: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/349.jpg)
Inductive types in ML and similar
• type-safe and powerful• allows pattern matching
I s R e a l of r e a l | I s I n t e g e r of i n t
• allows recursive definitions ⇒ inductive data types:
type i n t_b i n t r e e =Node of i n t ∗ i n t_b i n t r e e ∗ b i n t r e e
| N i l
• Node, Leaf, IsReal: constructors (cf. languages like Java)• constructors used as discriminators in “union” types
type exp =Plus of exp ∗ exp
| Minus of exp ∗ exp| Number of i n t| Var of s t r i n g
349 / 672
![Page 350: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/350.jpg)
Recursive data types in C
does not work
s t r u c t intBST {i n t v a l ;i n t i s N u l l ;s t r u c t intBST l e f t , r i g h t ;
}
“indirect” recursion
s t r u c t intBST {i n t v a l ;s t r u c t intBST ∗ l e f t , ∗ r i g h t ;
} ;typedef s t r u c t intBST ∗ intBST ;
In Java: references implicit
c l a s s BSTnode {i n t v a l ;BSTnode l e f t , r i g h t ;
• note: implementation in ML: also uses pointers (but hiddenfrom the user)
• no nil-pointers in ML (and NIL is not a nil-point, it’s acosntructor)
350 / 672
![Page 351: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/351.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
351 / 672
![Page 352: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/352.jpg)
Example with interfaces
i n t e r f a c e I 1 { i n t m ( i n t x ) ; }i n t e r f a c e I 2 { i n t m ( i n t x ) ; }c l a s s C1 implements I 1 {
pub l i c i n t m( i n t y ) { re tu rn y++; }}c l a s s C2 implements I 2 {
pub l i c i n t m( i n t y ) { re tu rn y++; }}
pub l i c c l a s s Noduck1 {pub l i c s t a t i c vo id main ( S t r i n g [ ] a rg ) {
I 1 x1 = new C1 ( ) ; // I 2 not p o s s i b l eI 2 x2 = new C2 ( ) ;x1 = x2 ;
}}
analogous effects when using classes in their roles as types
352 / 672
![Page 353: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/353.jpg)
Structural vs. nominal equality
a, b
va r a , b : r e c o r di n t i ;double d
end
c
va r c : r e c o r di n t i ;double d
end
typedef
typedef i dReco rd : r e c o r di n t i ;double d
end
va r d : i dReco rd ;va r e : i dReco rd ; ;
what’s possible?
a := c ;a := d ;
a := b ;d := a ;
353 / 672
![Page 354: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/354.jpg)
Types in the AST
• types are part of the syntax, as well• represent: either in a separate symbol table, or part of the AST
Record type
r e c o r dx : p o i n t e r to r e a l ;y : a r r a y [ 1 0 ] of i n t
end
procedure header
proc ( bool ,union a : r e a l ; b : cha r end ,i n t ) : vo id
end
354 / 672
![Page 355: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/355.jpg)
Structured types without names
var -decls → var -decls;var -decl ∣ var -declvar -decl → id ∶ type-exptype-exp → simple-type ∣ structured -type
simple-type → int ∣ bool ∣ real ∣ char ∣ voidstructured -type → array [num ]of type-exp
∣ recordvar -declsend∣ unionvar -declsend∣ pointertotype-exp∣ proc ( type-exps ) type-exp
type-exps → type-exps,type-exp ∣ type-exp
355 / 672
![Page 356: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/356.jpg)
Structural equality
356 / 672
![Page 357: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/357.jpg)
Types with names
var -decls → var -decls;var -decl ∣ var -declvar -decl → id ∶ simple-type-exp
type-decls → type-decls;type-decl ∣ type-decltype-decl → id = type-exptype-exp → simple-type-exp ∣ structured -type
simple-type-exp → simple-type ∣ idsimple-type → int ∣ bool ∣ real ∣ char ∣ void
structured -type → array [num ]of simple-type-exp∣ recordvar -declsend∣ unionvar -declsend∣ pointertosimple-type-exp∣ proc ( type-exps ) simple-type-exp
type-exps → type-exps,simple-type-exp ∣ simple-type-exp
357 / 672
![Page 358: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/358.jpg)
Name equality
• all types have “names”, and two types are equal iff their namesare equal
• type equality checking: obviously simpler• of course: type names may have scopes. . . .
358 / 672
![Page 359: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/359.jpg)
Type aliases
• languages with type aliases (type synonyms): C, Pascal, ML. . . .
• often very convenient (type Coordinate = float * float)• light-weight mechanism
type alias; make t1 known also under name t2
t2 = t1 // t2 i s the ‘ ‘ same type ’ ’ .
• also here: different choices wrt. type equality
Alias if simple types
t1 = i n t ;t2 = i n t ;
• often: t1 and t2 arethe “same” type
Alias of structured types
t1 = a r r a y [ 1 0 ] o f i n t ;t2 = a r r a y [ 1 0 ] o f i n t ;t3 = t2
• mostly t3 /= t1 /= t2359 / 672
![Page 360: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/360.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
360 / 672
![Page 361: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/361.jpg)
Type checking of expressions (and statements )
• types of subexpressions must “fit” to the expected types thecontructs can operate on50
• type checking: a bottom-up task⇒ synthesized attributes, when using AGs• Here: using an attribute grammar specification of the typechecker
• type checking conceptually done while parsing (as actions ofthe parser)
• also common: type checker operates on the AST after theparser has done its job51
• type system vs. type checker• type system: specification of the rules governing the use of
types in a language• type checker: algorithmic formulation of the type system (resp.
implementation thereof)50In case (operator) overloading: that may complicate the picture slightly.
Operators are selected depending on the type of the subexpressions.51one can, however, use grammars as specification of that abstract syntax
tree as well, i.e., as a “second” grammar besides the grammar for concreteparsing. 361 / 672
![Page 362: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/362.jpg)
Grammar for statements and expressions
program → var -decls;stmtsvar -decls → var -decls;var -decl ∣ var -declvar -decl → id ∶ type-exptype-exp → int ∣ bool ∣ array [num ]of type-exp
stmts → stmts;stmt ∣ stmtstmt → if exp then stmt ∣ id ∶= expexp → exp + exp ∣ exporexp ∣ exp [ exp ]
362 / 672
![Page 363: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/363.jpg)
Type checking as semantic rules
363 / 672
![Page 364: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/364.jpg)
Diverse notions
• Overloading• common for (at least) standard operations• also possible for user defined functions/methods . . .• disambiguation via (static) types of arguments• “ad-hoc” polymorphism• implementation:
• put types of parameters as “part” of the name• look-up gives back a set of alternatives
• type-conversions: can be problematic in connection withoverloading
• (generic) polymporphismswap(var x,y: anytype)
364 / 672
![Page 365: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/365.jpg)
INF5110 – Compiler Construction
Run-time environments
2. 02. 2016
365 / 672
![Page 366: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/366.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
366 / 672
![Page 367: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/367.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
367 / 672
![Page 368: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/368.jpg)
Static & dynamic memory layout at runtime
code area
global/static area
stack
free space
heap
Memory
typical memory layout: for languages (asnowadays basically all) with
• static memory• dynamic memory:
• stack• heap
368 / 672
![Page 369: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/369.jpg)
Translated program code
code for procedure 1 proc. 1
code for procedure 2 proc. 2
⋮
code for procedure n proc. n
Code memory
• code segment: almost alwaysconsidered as statically allocated
⇒ neither moved nor changed at runtime• compiler aware of all addresses of“chunks” of code: entry points of theprocedures
• but:• generated code often relocatable• final, absolute adresses given by
linker / loader
369 / 672
![Page 370: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/370.jpg)
Activation record
space for arg’s (parameters)
space for bookkeepinginfo, including returnaddress
space for local data
space for local temporaries
Schematic activation record
• schematic organization of activationrecords/activation block/stack frame. . .
• goal: realize• parameter passing• scoping rules /local variables
treatment• prepare for call/return behavior
• calling conventions on a platform
370 / 672
![Page 371: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/371.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
371 / 672
![Page 372: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/372.jpg)
Full static layout
code for main proc.
code for proc. 1
⋮
code for proc. n
global data area
act. record of main proc.
activation record of proc. 1
⋮
activation record of proc. n
• static addresses of all of memoryknown to the compiler
• executable code• variables• all forms of auxiliary data (for
instance big constants in theprogram, e.g., string literals)
• for instance: Fortran• nowadays rather seldom (or specialapplications like safety criticalembedded systems)
372 / 672
![Page 373: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/373.jpg)
Fortran example
PROGRAM TESTCOMMON MAXSIZEINTEGER MAXSIZEREAL TABLE(10 ) ,TEMPMAXSIZE = 10READ ∗ , TABLE(1 ) ,TABLE(2 ) ,TABLE(3)CALL QUADMEAN(TABLE, 3 ,TEMP)PRINT ∗ ,TEMPEND
SUBROUTINE QUADMEAN(A, SIZE ,QMEAN)COMMON MAXSIZEINTEGERMAXSIZE , SIZEREAL A(SIZE ) ,QMEAN, TEMPINTEGER KTEMP = 0.0IF ( ( SIZE .GT .MAXSIZE ) .OR . ( SIZE .LT . 1 ) ) GOTO 99DO 10 K = 1 , SIZE
TEMP = TEMP + A(K)∗A(K)10 CONTINUE99 QMEAN = SQRT(TEMP/SIZE )
RETURNEND
373 / 672
![Page 374: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/374.jpg)
Static memory layout example/runtime environment
MAXSIZEglobal area
TABLE (1)(2). . .(10)
TEMP
3
main’s act.record
A
SIZE
QMEAN
return address
TEMP
K
“scratch area”
Act. record ofQUADMEAN
in Fortan (here Fortran77)
• parameter passing as pointers tothe actual parameters
• activation record for QUADMEANcontains place for intermediateresults, compiler calculates, howmuch is needed.
• note: one possible memory layoutfor FORTRAN 77, details vary,other implementations exists as domore modern versions of Fortran
374 / 672
![Page 375: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/375.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
375 / 672
![Page 376: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/376.jpg)
Stack-based runtime environments
• so far: no recursion• everything static, including placement of activation records⇒ also return addresses statically known• a very ancient and restrictive arrangement of the run-time envs• calls and returns (also without recursion) follow at runtime aLIFO (= stack-like) discipline
stack of activation records• procedures as abstractions with own local data⇒ run-time memory arrangement where procedure-local data
together with other info (arrange proper returns, parameterpassing) is organized as stack.
• AKA: call stack, runtime stack• AR: exact format depends on language and platform
376 / 672
![Page 377: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/377.jpg)
Situation in languages without local procedures
• recursion, but all procedures are global• C-like languages
Activation record info (besides local data, see later)• frame pointer• control link (or dynamic link)a
• (optional): stack pointer• return address
aLater, we’ll encounter also static links (aka access links).
377 / 672
![Page 378: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/378.jpg)
Euclid’s recursive gcd algo
#inc lude <s t d i o . h>
i n t x , y ;
i n t gcd ( i n t u , i n t v ){ i f ( v==0) re tu rn u ;
e l s e re tu rn gcd ( v , u % v ) ;}
i n t main ( ){ s c an f ( "%d%d" ,&x ,&y ) ;
p r i n t f ( "%d\n" , gcd ( x , y ) ) ;re tu rn 0 ;
}
378 / 672
![Page 379: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/379.jpg)
Stack gcd
x:15y:10global/static area
“AR of main”
x:15y:10
control link
return address
a-record (1st. call)
x:10y:5
control link
return address
a-record (2nd. call)
x:5y:0
control linkfp
return addresssp
a-record (3rd. call)
↓
• control link• aka: dynamic link• refers to caller’s FP
• frame pointer FP• points to a fixed location in
the current a-record• stack pointer (SP)
• border of current stack andunused memory
• return address: program-addressof call-site
379 / 672
![Page 380: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/380.jpg)
Local and global variables and scoping
i n t x = 2 ; /∗ g l o b a l va r ∗/vo id g ( i n t ) ; /∗ p r o t o t yp e ∗/
vo id f ( i n t n ){ s t a t i c i n t x = 1 ;
g ( n ) ;x−−;
}
vo id g ( i n t m){ i n t y = m−1;
i f ( y > 0){ f ( y ) ;
x−−;g ( y ) ;
}}
i n t main ( ){ g ( x ) ;
re tu rn 0 ;}
• global variable x• but: (different) x local to f• remember C:
• call by value• static lexical scoping
380 / 672
![Page 381: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/381.jpg)
Activation records and activation trees
• activation of a function: corresponds to the call of a function• activation record
• data structure for run-time system• holds all relevant data for a function call and control-info in
“standardized” form• control-behavior of functions: LIFO• if data cannot outlive activation of a function⇒ activation records can be arranged in as stack (like here)• in this case: activation record AKA stack frame
GCDmain()
gcd(15,10)
gcd(10,5)
gcd(5,0)
f and g examplemain
g(2)
f(1)
g(1)
g(1)
381 / 672
![Page 382: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/382.jpg)
Variable access and design of ARs
• fp: frame pointer• m (in this example):parameter of g
• AR’s: structurally uniform per language (orat least compiler) / platform
• different function defs, different size of AR⇒ frames on the stack differently sized• note: FP points
• not to the “top” of the frame/stack, but• to a well-chosen, well-defined position in
the frame• other local data (local vars) accessible
relative to that• conventions
• higher addresses “higher up”• stack “grows” towards lower addresses• in the picture: “pointers” to the “bottom”
of the meant slot (e.g.: fp points to thecontrol link)
382 / 672
![Page 383: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/383.jpg)
Layout for arrays of statically known size
vo id f ( i n t x , char c ){ i n t a [ 1 0 ] ;
double y ;. .
}
name offsetx +5c +4a -24y -32
access of c andy
c : 4( fp )y : −32( fp )
access for A[i]
(−24+2∗ i ) ( fp )
383 / 672
![Page 384: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/384.jpg)
Back to the C code again (global and local variables)
i n t x = 2 ; /∗ g l o b a l va r ∗/vo id g ( i n t ) ; /∗ p ro t o t yp e ∗/
vo id f ( i n t n ){ s t a t i c i n t x = 1 ;
g ( n ) ;x−−;
}
vo id g ( i n t m){ i n t y = m−1;
i f ( y > 0){ f ( y ) ;
x−−;g ( y ) ;
}}
i n t main ( ){ g ( x ) ;
re tu rn 0 ;}
384 / 672
![Page 385: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/385.jpg)
2 snapshots of the call stack
x:2x:1 (@f)static
main
m:2
control link
return address
y:1
g
n:1
control link
return address
f
m:1
control linkfp
return address
y:0sp
g
...
x:1x:0 (@f)static
main
m:2
control link
return address
y:1
g
m:1
control linkfp
return address
y:0sp
g
...
• note: call by value, x in f static385 / 672
![Page 386: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/386.jpg)
How to do the “push and pop”: calling sequences
• calling sequences: AKA as linking convention or callingconventions
• for RT environments: uniform design not just of• data structures (=ARs), but also of• uniform actions being taken when calling/returning from a
procedure• how to actually do the details of “push and pop” on thecall-stack
E.g: Parameter passing• not just where (in the ARs) to find value for the actualparameter needs to be defined, but well-defined steps(ultimately code) that copies it there (and potentially reads itfrom there)
• “jointly” done by compiler + OS + HW• distribution of responsibilities between caller and callee:
• who copies the parameter to the right place• who saves registers and restores them• . . .
386 / 672
![Page 387: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/387.jpg)
Steps when calling
• For procedure call (entry)1. compute arguments, store them in the correct positions in the
new activation record of the procedure (pushing them in orderonto the runtume stack will achieve this)
2. store (push) the fp as the control link in the new activationrecord
3. change the fp, so that it points to the beginning of the newactivation record. If there is an sp, copying the sp into the fpat this point will achieve this.
4. store the return address in the new activation record, ifnecessary
5. perform a jump to the code of the called procedure.6. Allocate space on the stack for local var’s by appropriate
adjustement of the sp• procedure exit
1. copy the fp to the sp2. load the control link to the fp3. perform a jump to the return address4. change the sp to pop the arg’s
387 / 672
![Page 388: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/388.jpg)
Steps when calling g
rest of stack
m:2
control link
return addr.fp
y:1
...sp
before call to g
rest of stack
m:2
control link
return addr.fp
y:1
m:1
...sp
pushed param.
rest of stack
m:2
control link
return addr.fp
y:1
m:1
control link...
sp
pushed fp
388 / 672
![Page 389: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/389.jpg)
Steps when calling g (2)
rest of stack
m:2
control link
return addr.
y:1
m:1
control link
return addressfp
. . .sp
fp := sp,push return addr.
rest of stack
m:2
control link
return addr.
y:1
m:1
control link
return addressfp
y:0
...sp
alloc. local var y
389 / 672
![Page 390: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/390.jpg)
Treatment of auxiliary results: “temporaries”
rest of stack
. . .
control link
return addr.fp
. . .
address of x[i]
result of i+j
result of i/ksp
new AR for f(about to becreated)
...
• calculations need memory forintermediate results.
• called temporaries in ARs.
x [ i ] = ( i + j ) ∗ ( i /k + f ( j ) ) ;
• note: x[i] represents an address orreference, i, j, k represent valuesa
• assume a strict -left-to-right evaluation(call f(j) may change values.)
• stack of temporaries.• [NB: compilers typically use registers asmuch as possible, what does not fitthere goes into the AR.]
aintegers are good for array-offsets, so they act as “references” as well.
390 / 672
![Page 391: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/391.jpg)
Variable-length data
t ype In t_Vector i sa r r a y ( INTEGER r ange <>) o f INTEGER ;
p r o c e du r e Sum( low , h igh : INTEGER ;A: Int_Vector ) r e t u r n INTEGER
i si : i n t e g e r
beg i n. . .
end Sum ;
• Ada example• assume: array passed by value(“copying”)
• A[i]: calculated as @6(fp) + 2*i• in Java and other languages: arrayspassed by reference
• note: space for A (as ref) and sizeof A is fixed-size (as well as lowand high)
rest of stack
low:. . .
high:. . .
A:
size of A: 10
control link
return addr.fp
i:...
A[9]
. . .
A[0]
...sp
AR of call to SUM
391 / 672
![Page 392: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/392.jpg)
Nested declarations (“compound statements”)
v o i d p ( i n t x , doub l e y ){ cha r a ;
i n t i ;. . . ;
A: { doub l e x ;i n t j ;. . . ;
}. . . ;
B : { cha r ∗ a ;i n t k ;. . . ;
} ;. . . ;
}
rest of stack
x:
y:
control link
return addr.fp
a:
i:
x:
j:
...sp
area for block A allocated
rest of stack
x:
y:
control link
return addr.fp
a:
i:
a:
k:
...sp
area for block B allocated
392 / 672
![Page 393: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/393.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
393 / 672
![Page 394: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/394.jpg)
Nested procedures in Pascal1
program nonLoca lRe f ;p r o c e du r e p ;v a r n : i n t e g e r ;
p r o c e du r e q ;beg i n
(∗ a r e f to n i s nownon− l o c a l , non− g l o b a l ∗)
end ; (∗ q ∗)
p r o c e du r e r ( n : i n t e g e r ) ;beg i n
q ;end ; (∗ r ∗)
beg i n (∗ p ∗)n := 1 ;r ( 2 ) ;
end ; (∗ p ∗)
beg i n (∗ main ∗)p ;
end .
• proc. p contains q and r nested• also “nested” (i.e., local) in p: integer n
• in scope for q and r but• neither global nor local to q and r
394 / 672
![Page 395: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/395.jpg)
Accessing non-local var’s (here access n from q)
vars of main
control link
return addr.
n:1
p
n:2
control link
return addr.
r
control linkfp
return addr.sp
q
...
calls m → p → r → q
• n in q: under lexical scope: n declaredin procedure p is meant
• not reflected in the stack (of course) asthat represents the run-time call stack
• remember: static links (or access links)in connection with symbol tables
Symbol tables• “name-addressable”mapping
• access atcompile time
• cf. scope tree
Dynamic memory• “adresss-adressable”mapping
• access at run time• stack-organized,reflecting paths incall graph
• cf. activation tree
395 / 672
![Page 396: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/396.jpg)
Access link as part of the AR
vars of main
(no access link)
control link
return addr.
n:1
n:2
access link
control link
return addr.
access link
control linkfp
return addr.sp
...
calls m → p → r → q
• access link (or static link): part ofAR (at fixed position)
• points to stack-frame representingthe current AR of the staticallyenclosed “procedural” scope
396 / 672
![Page 397: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/397.jpg)
Example with multiple levels
program cha i n ;
procedure p ;var x : i n t ege r ;
procedure q ;procedure r ;begin
x :=2;. . . ;i f . . . then p ;
end ; (∗ r ∗)begin
r ;end ; (∗ q ∗)
beginq ;
end ; (∗ p ∗)
begin (∗ main ∗)p ;
end .
397 / 672
![Page 398: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/398.jpg)
Access chaining
AR of main
(no access link)
control link
return addr.
x:1
access link
control link
return addr.
access link
control linkfp
return addr.sp
...
calls m → p → q → r
• program chain• access (conceptual): fp.al.al.x• access link slot: fixed “offset” inside AR(but: AR’s differently sized)
• “distance” from current AR to place of x• not fixed, i.e.• statically unknown!
• However: number of access linkderenferences statically known
• lexical nesting level
398 / 672
![Page 399: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/399.jpg)
Implementing access chaining
As example:
fp.al.al.al. ... al.x
• access need to be fast => use registers• assume, at fp in dedicated register
4( fp ) −> reg // 14( fp ) −> reg // 2. . .4( fp ) −> reg // n = d i f f e r e n c e i n n e s t i n g l e v e l s6( r eg ) // a c c e s s con t en t o f x
• often: not so many block-levels/access chains nessessary
399 / 672
![Page 400: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/400.jpg)
Calling sequence
• For procedure call (entry)
1. compute arguments, store them in the correct positions in thenew activation record of the procedure (pushing them in orderonto the runtume stack will achieve this)
2. • push access link, value calculated via link chaining (“fp.al.al.... ”)
• store (push) the fp as the control link in the new AR3. change fp, to point to the beginning of the new AR. If there is
an sp, copying sp into fp at this point will achieve this.4. store the return address in the new AR, if necessary5. perform a jump to the code of the called procedure.6. Allocate space on the stack for local var’s by appropriate
adjustement of the sp• procedure exit
1. copy the fp to the sp2. load the control link to the fp3. perform a jump to the return address4. change the sp to pop the arg’s and the access link
400 / 672
![Page 401: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/401.jpg)
Calling sequence: with access links
AR of main(no access link)
control link
return addr.x:...
access link
control link
return addr.
access link
control link
return addr.
no access link
control link
return addr.x:...
access link
control link
return addr.
access link
control linkfp
return addr. sp...
after 2nd call to r
• main → p → q → r → p → q → r• calling sequence: actions to do the“push & pop”
• distribution of responsibilitiesbetween caller and callee
• generate an appropriate accesschain, chain-length staticallydetermined
• actual computation (of course)done at run-time
401 / 672
![Page 402: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/402.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
402 / 672
![Page 403: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/403.jpg)
Example with multiple levels
program cha i n ;
procedure p ;var x : i n t ege r ;
procedure q ;procedure r ;begin
x :=2;. . . ;i f . . . then p ;
end ; (∗ r ∗)begin
r ;end ; (∗ q ∗)
beginq ;
end ; (∗ p ∗)
begin (∗ main ∗)p ;
end .
403 / 672
![Page 404: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/404.jpg)
Access chaining
AR of main
(no access link)
control link
return addr.
x:1
access link
control link
return addr.
access link
control linkfp
return addr.sp
...
calls m → p → q → r
• program chain• access (conceptual): fp.al.al.x• access link slot: fixed “offset” inside AR(but: AR’s differently sized)
• “distance” from current AR to place of x• not fixed, i.e.• statically unknown!
• However: number of access linkderenferences statically known
• lexical nesting level
404 / 672
![Page 405: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/405.jpg)
Procedures are parameter
program c l o s u r e e x ( output ) ;
procedure p ( procedure a ) ;begin
a ;end ;
procedure q ;var x : i n t ege r ;
procedure r ;begin
w r i t e l n ( x ) ;end ;
beginx := 2 ;p ( r ) ;
end ; (∗ q ∗)
begin (∗ main ∗)q ;
end .
405 / 672
![Page 406: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/406.jpg)
Procedures as parameters, same example in Go
package mainimport ( " fmt" )
var p = func ( a ( func ( ) ( ) ) ) { // ( u n i t −> un i t ) −> un i ta ( )
}
var q = func ( ) {var x = 0var r = func ( ) {fmt . P r i n t f ( "␣x␣=␣%v" , x )}x = 2p ( r ) // r as argument
}
func main ( ) {q ( ) ;
}
406 / 672
![Page 407: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/407.jpg)
Procedures as parameters, same example in ocaml
l e t p ( a : u n i t −> un i t ) : u n i t = a ( ) ; ;
l e t q ( ) =l e t x : i n t r e f = r e f 1i n l e t r = f unc t i on ( ) −> ( p r i n t_ i n t ! x ) (∗ d e r e f ∗)i nx := 2 ; (∗ as s i gnment to r e f − typed va r ∗)p ( r ) ; ;
q ( ) ; ; (∗ ‘ ‘ body o f main ’ ’ ∗)
407 / 672
![Page 408: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/408.jpg)
Closures in [?]
• [?] rather “implementation centric”• closure there:
• restricted setting• specific way to achieve closures• specific semantics of non-local vars (“by reference”)
• higher-order functions:• functions as arguments and return values• nested function declaration
• similar problems with: “function variables”• Example shown: only procedures as parameters
408 / 672
![Page 409: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/409.jpg)
Closures, schematically
• indepdendent from concrete design of the RTE/ARs:• what do we need to execute the body of a procedure?
Closure (abstractly)A closure is a function bodya together with the values for all itsvariables, including the non-local ones.60
aResp.: at least the possibility to locate that.
• individual AR not enough for all variables used (non-local vars)• in stack-organized RTE’s:
• fortunately ARs are stack-allocated→ with clever use of “links” (access/static links): possible to
access variables that are “nested further out’/ deeper in thestack (following links)
409 / 672
![Page 410: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/410.jpg)
Organize access with procedure parameters
• when calling p: allocate a stack frame• executing p calls a => another stack frame• number of parameters etc: knowable from the type of a• but 2 problems
“control-flow problemcurrently only RTE, but: how can(the compiler arrange that) pcalls a (and allocate a frame fora) if a is not know yet?
data problemHow can one statically arrangethat a will be able to accessnon-local variables if statically it’snot known what a will be?
• solution: for a procedure variable (like a): store in AR• reference to the code of argument (as representation of the
function body)• reference to the frame, i.e., the relevant frame pointer (here:
to the frame of q where r is defined)• this pair = closure!
410 / 672
![Page 411: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/411.jpg)
Closure for formal parameter a of the example
• stack after the call to p• closure ⟨ip, ep⟩• ep: refers to q’s framepointer
• note: distinction in callingsequence for
• calling orginary proc’s and• calling procs in proc
parameters (i.e., viaclosures)
• it may be unified (“closures”only)
411 / 672
![Page 412: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/412.jpg)
After calling a (= r)
• note: static link of the newframe: used from theclosure!
412 / 672
![Page 413: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/413.jpg)
Making it uniform
• note: calling conventionsdiffer
• calling procedures asformal parameters
• “standard” procedures(statically known)
• treatment can be madeuniform
413 / 672
![Page 414: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/414.jpg)
Limitations of stack-based RTEs
• procedures: central (!) control-flow abstraction in languages• stack-based allocation: intuitive, common, and efficient(supported by HW)
• used in many/most languages• procedure calls and returns: LIFO (= stack) behavior• AR: local data for procedure body
Underlying assumption for stack-based RTEsThe data (=AR) for a procedure cannot outlive the activationwhere they are declared.
• assumption can break for many reasons• returning references of local variables• higher-order functions (or function variables)• “undisciplined” control flow (rather deprecated, can break any
scoping rules, or procedure abstraction)• explicit memory allocation (and deallocation), pointer
arithmetic etc.414 / 672
![Page 415: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/415.jpg)
Dangling ref’s due to returing references
i n t ∗ dang l e ( vo id ) {i n t x ; // l o c a l va rre tu rn &x ; // add r e s s o f x
}
• similar: returning references to objects created via new• variable’s lifetime may be over, but the reference lives on . . .
415 / 672
![Page 416: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/416.jpg)
Function variables
program Funcvar ;var pv : Procedure ( x : i n t ege r ) ;
Procedure Q( ) ;var
a : i n t ege r ;Procedure P( i : i n t ege r ) ;begin
a:= a+i ; (∗ a def ’ ed o u t s i d e ∗)end ;
beginpv := @P; (∗ ‘ ‘ r e t u rn ’ ’ P , ∗)
end ; (∗ "@" dependent on d i a l e c t ∗)begin
Q( ) ;pv ( 1 ) ;
end .
funcvarRuntime error 216 at $0000000000400233
$0000000000400233$0000000000400268$00000000004001E0 416 / 672
![Page 417: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/417.jpg)
Functions as return values
package mainimport ( " fmt" )
var f = func ( ) ( func ( i n t ) i n t ) { // un i t −> ( i n t −> i n t )var x = 40 // l o c a l v a r i a b l evar g = func ( y i n t ) i n t { // ne s t ed f u n c t i o n
re tu rn x + 1}x = x+1 // update xre tu rn g // f u n c t i o n as r e t u r n v a l u e
}
func main ( ) {var x = 0var h = f ( )fmt . P r i n t l n ( x )var r = h (1)fmt . P r i n t f ( "␣ r ␣=␣%v" , r )
}
• function g• defined local to f• uses x, non-local to g, local to f• is being returned from f 417 / 672
![Page 418: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/418.jpg)
Fully-dynamic RTEs
• full higher-order functions = functions are “data” same aseverything else
• function being locally defined• function as arguments to other functions• functions returned by functions
→ ARs cannot be stack-allocated• closures needed, but heap-allocated• objects (and references): heap-allocated• less “disciplined” memory handling than stack-allocation• garbage collection52
• often: stack based allocation + fully-dynamic (= heap-based)allocation
52The stack discipline can be seen as a particularly simple (and efficient)form of garbage collection: returnion from a function makes it clear that thelocal data can be thrashed.
418 / 672
![Page 419: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/419.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
419 / 672
![Page 420: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/420.jpg)
Object-orientation
• class-based/inheritance-based OO• classes and sub-classes• typed references to objects• virtual and non-virtual methods
420 / 672
![Page 421: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/421.jpg)
Virtual and non-virtual methods
c l a s s A {i n t x , y
vo id f ( s , t ) { . . . K . . . } ;v i r t u a l vo id g (p , q ) { . . . L . . . } ;
} ;
c l a s s B ex t end s A {i n t z
vo id f ( s , t ) { . . . Q . . . } ;r e d e f vo id g (p , q ) { . . . M . . . } ;v i r t u a l vo id h ( r ) { . . . N . . . }
} ;
c l a s s C ex t end s B {i n t u ;r e d e f vo id h ( r ) { . . . P . . . } ;
}
421 / 672
![Page 422: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/422.jpg)
Call to virtual and non-virtual methods
non-virtual method f
call targetrA.f KrB .f QrC .f Q
virtual methods g and h
call targetrA.g L or MrB .g MrC .g M
rA.h illegalrB .h N or PrC .h P
422 / 672
![Page 423: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/423.jpg)
Late binding/dynamic binding
• details very much depend on the language/flavor of OO• single vs. multiple inheritance• method update, method extension possible• how much information available (e.g., static type information)
• simple approach: “embedding” methods (as references)• seldomly done, but for updateable methods
• using inheritance graph• each object keeps a pointer to it’s class (for locate virtual
methods)• virtual function table
• in static memory• no traversal necessary• class structure need be known at compile-time• C++
423 / 672
![Page 424: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/424.jpg)
Virtual function table
• static check (“type check”) of rX .f ()• both for virtual and non-virtuals• f must be defined in X or one of its
superclasses
• non-virtual binding: finalized by the combiler(static binding)
• virtual methods: enumerated (with offset)from the first class with a virtual method,redefinitions get the same “number”
• object “headers”: point to the classe’s virtualfunction table
• rA.g():
c a l l r_A . v i r t t a b [ g_o f f s e t ]
• compiler knows• g_offset = 0• h_offset = 1 424 / 672
![Page 425: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/425.jpg)
Virtual method implementation in C++
• according to [?]
c l a s s A {p u b l i c :double x , y ;vo id f ( ) ;v i r t u a l vo id g ( ) ;
} ;
c l a s s B : p u b l i c A {p u b l i c :double z ;vo id f ( ) ;v i r t u a l vo id h ( ) ;
} ;
425 / 672
![Page 426: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/426.jpg)
Untyped references to objects (e.g. Smalltalk)
• all methods virtual• problem of virtual-tablesnow: virtual tables need tocontain all methods of allclasses
• additional complication:method extension
• Therefore: implementationof r.g() (assume: fomitted)
• go to the object’s class• search for g following the
superclass hierarchy.
426 / 672
![Page 427: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/427.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
427 / 672
![Page 428: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/428.jpg)
Communicating values between procedures
• procedure abstraction, modularity• parameter passing = communication of values betweenprocedures
• from caller to callee (and back)• binding actual parameters• with the help of the RTE• formal parameters vs. actual parameters• two modern versions
1. call by value2. call by reference
428 / 672
![Page 429: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/429.jpg)
CBV and CBR, roughly
Core distinction/questionon the level of caller/callee activation records (on the stack frame):how does the AR of the callee get hold of the value the caller wantsto hand over?1. callee’s AR with a copy of the value for the formal parameter2. the callee AR with a pointer to the memory slot of the actual
parameter
• if one has to choose only one: it’s call by value53• remember: non-local variables (in lexical scope), nestedprocedures, and even closures:
• those variables are “smuggled in” by reference• [NB: there are also by value closures]
53CBV is in a way the prototypical, most dignified way of parameter passsing,supporting the procedure abstraction. If one has references (explicit or implicit,of data on the heap, typically), then one has call-by-value-of-references, which,in some way “feels” for the programmer as call-by-reference. Some people evencall that call-by-reference, even if it’s technically not.
429 / 672
![Page 430: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/430.jpg)
Parameter passing "by-value"
• in C: CBV only parameterpassing method
• in some lang’s: formalvariables “immutable”
• straightforward: copy actualparameters → formalparameters (in the ARs).
vo id i n c 2 ( i n t x ){ ++x , ++x ; }
vo id i n c 2 ( i n t ∗ x ){ ++(∗x ) , ++(∗x ) ; }/∗ c a l l : i n c (&y ) ∗/
vo id i n i t ( i n t x [ ] , i n t s i z e ) {i n t i ;f o r ( i =0; i<s i z e ,++ i ) x [ i ]= 0
}
arrays: “by-reference” data
430 / 672
![Page 431: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/431.jpg)
Call-by-reference
• hand overpointer/reference/address ofthe actual parameter
• useful especially for largedata structures
• typically: actual parametermust be a variable
• Fortran actually allowsthings like P(5,b) andP(a+b,c).
vo id i n c 2 ( i n t ∗ x ){ ++(∗x ) , ++(∗x ) ; }/∗ c a l l : i n c (&y ) ∗/
vo i d P( p1 , p2 ) {. .p1 = 3
}va r a , b , c ;P( a , c )
431 / 672
![Page 432: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/432.jpg)
Call-by-value-result
• call-by-value-result can give different results from cbr• allocated as a local variable (as cbv)• however: copied “two-way”
• when calling: actual → formal parameters• when returning: actual ← formal parameters
• aka: “copy-in-copy-out” (or “copy-restore”)• Ada’s in and out parmeters• when are the value of actual variables determined when doing“actual ← formal parameters”
• when calling• when returning
• not the cleanest parameter passing mechanism around. . .
432 / 672
![Page 433: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/433.jpg)
Call-by-value-result example
vo id p ( i n t x , i n t y ){
++x ;++y ;
}
main ( ){ i n t a = 1 ;
p ( a , a ) ;re tu rn 0 ;
}
• C-syntax (C has cbv, not cbvr)• note: aliasing (via the arguments, here obvious)• cbvr: same as cbr, unless aliasing “messes it up”54
54One can ask though, if not call-by-reference is messed-up in the exampleitself.
433 / 672
![Page 434: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/434.jpg)
Call-by-name (C-syntax)
• most complex (or is it)• hand over: textual representation (“name”) of the argument(substitution)
• in that respect: a bit like macro expansion (but lexicallyscoped)
• actual paramater not calculated before actually used!• on the other hand: if neeeded more than once: recalculatedover and over again
• aka: delayed evaluation• Implementation
• actual paramter: represented as a small procedure (thunk,suspension), if actual parameter = expression
• optimization, if actually parameter = variable (works likecall-by-reference then)
434 / 672
![Page 435: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/435.jpg)
Call-by-name examples
• in (imperative) languages without procedure parameters:• delayed evaluation most visible when dealing with things likea[i]
• a[i] is actually like “apply a to index i”• combine that with side-effects (i++) ⇒ pretty confusing
vo id p ( i n t x ) { . . . ; ++x ; }
• call as p(a[i])• corresponds to ++(a[i])• note:
• ++ _ has a side effect• i may change in ...
i n t i ;i n t a [ 1 0 ] ;vo id p ( i n t x ) {++i ;++x ;
}
main ( ) {i = 1 ;a [ 1 ] = 1 ;a [ 2 ] = 2 ;p ( a [ i ] ) ;re tu rn 0 ;
}
435 / 672
![Page 436: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/436.jpg)
Another example: “swapping”
i n t i ; i n t a [ i ] ;
swap ( i n t a , b ) {i n t i ;i = a ;a = b ;b = i ;
}
i = 3 ;a [ 3 ] = 6 ;
swap ( i , a [ i ] ) ;
• note: local and global variable i
436 / 672
![Page 437: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/437.jpg)
Call-by-name illustrations
procedure P( par ) : name par , i n t p a r tbegin
i n t x , y ;. . .par := x + y ; (∗ x := par + y ∗)
end ;
P( v ) ;P( r . v ) ;P ( 5 ) ;P( u+v )
v r.v 5 u+vpar := x+y ok ok error errorx := par +y ok ok ok ok
437 / 672
![Page 438: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/438.jpg)
Call by name (Algol)
begin comment Simple a r r a y example ;p ro c edu r e z e r o ( Arr , i , j , u1 , u2 ) ;i n t e g e r Arr ;i n t e g e r i , j , u1 , u2 ;
beg inf o r i := 1 s t ep 1 u n t i l u1 do
f o r j := 1 s t ep 1 u n t i l u2 doArr := 0
end ;
i n t e g e r a r r a y Work [ 1 : 1 0 0 , 1 : 2 0 0 ] ;i n t e g e r p , q , x , y , z ;x := 100 ;y := 200ze r o (Work [ p , q ] , p , q , x , y ) ;end
438 / 672
![Page 439: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/439.jpg)
Lazy evaluation
• call-by-name• complex & potentially confusing (in the presence of side
effects)• not really used (there)
• declarative/functional languages: lazy evaluation• optimization:
• avoid recalculation of the argument⇒ remember (and share) results after first calculation
(“memoization”)• works only in absence of side-effects
• most prominently: Haskell• useful for operating on infinite data structures (for instance:streams)
439 / 672
![Page 440: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/440.jpg)
Lazy evaluation / streams
magic : : I n t −> I n t −> [ I n t ]magic 0 _ = [ ]magic m n = m : ( magic n (m+n ) )
g e t I t : : [ I n t ] −> I n t −> I n tg e t I t [ ] _ = undef inedg e t I t ( x : x s ) 1 = xg e t I t ( x : x s ) n = g e t I t xs (n−1)
440 / 672
![Page 441: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/441.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
441 / 672
![Page 442: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/442.jpg)
Management of dynamic memory: GC & alternativesa
aStarting point slides from Ragnhild Kobro Runde, 2015.
• dynamic memory: allocation & deallocation at run-time• different alternatives
1. manual• “alloc”, “free”• error prone
2. “stack” allocated dynamic memory• typically not called GC
3. automatic reclaim of unused dynamic memory• requires extra provisions by the compiler/RTE
442 / 672
![Page 443: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/443.jpg)
Heap
• “heap” unrelated to the well-knownheap-data structure from A&D
• part of the dynamic memory• contains typically
• objects, records (which aredynamocally allocated)
• often: arrays as well• for “expressive” languages:
heap-allocated activation records• coroutines (e.g. Simula)• higher-order functions
code area
global/static area
stack
free space
heap
Memory
443 / 672
![Page 444: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/444.jpg)
Problems with free use of pointers
i n t ∗ dang l e ( v o i d ) {i n t x ; // l o c a l va rr e t u r n &x ; // add r e s s o f x
}
t y p e d e f i n t (∗ proc ) ( v o i d ) ;
p roc g ( i n t x ) {i n t f ( v o i d ) { /∗ i l l e g a l
∗/r e t u r n x ;
}r e t u r n f ;
}
main ( ) {proc c ;c = g ( 2 ) ;p r i n t f ( "%d\n" , c ( ) ) ; /∗ 2? ∗/r e t u r n 0 ;
}
• as seen before: references,higher-order functions, coroutinesetc ⇒ heap-allocated ARs
• higher-order functions: typical forfunctional languages,
• heap memory: no LIFO discipline• unreasonable to expect user to“clean up” AR’s (already alloc andfree is error-prone)
• ⇒ garbage collection (alreadydating back to 1958/Lisp)
444 / 672
![Page 445: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/445.jpg)
Some basic design decisions
• gc approximative, but non-negotiable condition: never reclaimcells which may be used in the future
• one basic decision:1. never move objects55
• may lead to fragmentation2. move objects which are still needed
• extra administration/information needed• all reference of moved objects need adaptation• all free spaces collected adjacently (defragmentation)
• when to do gc?• how to get info about definitely unused/potentially usedobects?
• “monitor” the interaction program ↔ heap while it runs, tokeep “up-to-date” all the time
• inspect (at approritate points in time) the state of the heap55Objects here are meant as heap-allocated entities, which in OO languages
includes objects, but here referring also to other data (records, arrays, closures. . . ).
445 / 672
![Page 446: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/446.jpg)
Mark (and sweep): marking phase
• observation: heap addresses only reachabledirectly through variables (with references), kept in the
run-time stack (or registers)indirectly following fields in reachable objects, which point
to further objects . . .• heap: graph of objects, entry points aka “roots” or root set• mark: starting from the root set:
• find reachable objects, mark them as (potentially) used• one boolean (= 1 bit info) as mark• depth-first search of the graph
446 / 672
![Page 447: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/447.jpg)
Marking phase: follow the pointers via DFS
• layout (or “type”) of objects need to beknown to determine where pointers are
• food for thought: doing DFS requires astack, in the worst case of comparablesize as the heap itself . . . .
447 / 672
![Page 448: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/448.jpg)
Compactation
MarkedCompacted
448 / 672
![Page 449: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/449.jpg)
After marking?
• known classification in “garbage” and “non-garbage”• pool of “unmarked” objects• however: the “free space” not really ready at hand:• two options:
1. sweep• go again through the heap, this time sequentially (no
graph-search)• collect all unmarked objects in free list• objects remain at their place• RTE need to allocate new object: grab free slot from free list
2. compaction as well ::• avoid fragmentation• move non-garbage to one place, the rest is big free space• when moving objects: adjust pointers
449 / 672
![Page 450: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/450.jpg)
Stop-and-copy
• variation of the previous compactation• mark & compactation can be done in recursive pass• space for heap-managment
• split into two halves• only one half used at any given point in time• compactation by copying all non-garbage (marked) to the
currently unused half
450 / 672
![Page 451: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/451.jpg)
Step by step
451 / 672
![Page 452: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/452.jpg)
Step by step
452 / 672
![Page 453: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/453.jpg)
Step by step
453 / 672
![Page 454: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/454.jpg)
Step by step
454 / 672
![Page 455: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/455.jpg)
Step by step
455 / 672
![Page 456: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/456.jpg)
INF5110 – Compiler Construction
Intermediate code generation
2. 02. 2016
456 / 672
![Page 457: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/457.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
457 / 672
![Page 458: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/458.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
458 / 672
![Page 459: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/459.jpg)
Schematic anatomy of a compilera
aThis section, based on slides from Stein Krogdahl, 2015.
• code generator:• may in itself be “phased”• using additional intermediate representation(s) (IR) and
intermediate code459 / 672
![Page 460: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/460.jpg)
A closer look
460 / 672
![Page 461: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/461.jpg)
Various forms of “executable” code
• different forms of code: relocatable vs. “absolute” code,relocatable code from libraries, assembler, etc
• often: specific file extensions• Unix/Linux etc.
• asm: *-s• rel: *.a• rel from library: *.a• abs: files without file extension (but set as executable)
• Windows:• abs: *.exe56
• byte code (specifically in Java)• a form of intermediate code, as well• executable in the JVM• in .NET/C♯: CIL
• also called byte-code, but compiled further
56.exe-files include more, and “assembly” in .NET even more461 / 672
![Page 462: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/462.jpg)
Generating code: compilation to machine code
• 3 main forms or variations:1. machine code in textual assembly format (assembler can
“compile” it to 2. and 3.)2. relocatable format (further processed by loader3. binary machine code (directly executable)
• seen as different representations, but otherwise equivalent• in practice: for portability
• as another intermediate code: “platform independent” abstractmachine code possible.
• capture features shared roughly by many platforms• eg. there are stack frames, static links, and push and pop, but
exact layout of the frames is platform dependent• platform dependent details:
• platform dependent code• filling in call-sequence / linking conventions
done in a last step
462 / 672
![Page 463: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/463.jpg)
Byte code generation
• semi-compiled well-defined format• platform.independent• further away from any HW, quite more high-level• for example: Java byte code (or CIL for .NET and C♯)
• can be interpreted, but often compiled further to machine code(“just-in-time compiler” JIT)
• exectured (interpreted) in a “virtual machine” (JVM)• often: stack-oriented execution code (in post-fix format)• also internal intermediate code (in compiled languages) mayhave stack-oriented format (“P-code”)
463 / 672
![Page 464: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/464.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
464 / 672
![Page 465: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/465.jpg)
Use of intermediate code
• two kinds of IC covered1. three-address code
• generic (platform-independent) abstract machine code• new names for all intermediate results• can be seen as unbounded pool of maschine registers• advantages (portability, optimization . . . )
2. P-code (“Pascal-code”, a la Java “byte code”• originally proposed for interpretation• now often translated before execution (cf. JIT-compilation)• intermediate results in stack (with postfix operations)
• many variations and elaborations for both kinds• addresses symbolically or represented as numbers (or both)• granularity/“instruction set”/level of abstract: high-level op’s
available e.g., for array-access or: translation in moreelementary op’s needed.
• operands (still) typed or not• . . .
465 / 672
![Page 466: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/466.jpg)
Various translations in the lecture
• AST here: tree structureafter semantic analysis, let’sjust call it AST+ or justsimply AST.
• translation AST ⇒ P-code:appox. as in Oblig 2
• we touch upon many generalproblems/techniques in“translations”
• on (important) we ignore fornow: register allocation
AST+
TAC
p-code
466 / 672
![Page 467: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/467.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
467 / 672
![Page 468: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/468.jpg)
Three-address code
• common (form of) IR
Basic format
x=y op z
• x , y , y : names, constants, temporaries . . .• some operations need fewer arguments
• example of a (common) linear IR• linear IR: ops include control-flow instructions (like jumps)• alternative linear IRs (on a similar level of abstraction):1-address codes (stack-machine code), 2 address codes.
• well-suited for optimizations• modern archictures often have 3-address code like instructionsets (RISC-architectures)
468 / 672
![Page 469: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/469.jpg)
TAC example (expression)
2*a+(b-3)
+
*
2 a
-
b 3
Three-address code
t1 = 2 ∗ at2 = b − 3t3 = t1 + t2
alternative sequence
t1 = b − 3t2 = 2 ∗ at3 = t2 + t1
469 / 672
![Page 470: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/470.jpg)
TAC instruction set
• basic format: x = y op z• but also:
• x = op z• x = y
• operators: +,-,*,/, <, >, and, or• read x , write x• labelL (sometimes called a “pseudo-instruction”)• conditional jumps: if_false x goto L
• t1, t2, t3 . . . . (or t1, t2, t3, . . . ): temporaries (ortemporary variables)
• assumed: unbounded reservoir of those• note: “non-destructive” assignments (single-assignment)
470 / 672
![Page 471: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/471.jpg)
Illustration: translation to TAC
Source
r ead x ; { i n pu t an i n t e g e r }i f 0<x then
f a c t := 1 ;r e p e a t
f a c t := f a c t ∗ x ;x := x −1
u n t i l x = 0 ;w r i t e f a c t { output : f a c t o r i a l o f x }
end
Target: TAC
r e ad x ; { i n pu t an i n t e g e r }i f 0<x then
f a c t := 1 ;r e p e a t
f a c t := f a c t ∗ x ;x := x −1
u n t i l x = 0 ;w r i t e f a c t { output : f a c t o r i a l o f x }
end
471 / 672
![Page 472: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/472.jpg)
Variations in the design of TA-code
• provide operators for int, long, float . . . .?• how to represent program variables
• names/symbols• pointers to the declaration in the symbol table?• (abstract) machine address?
• how to store/represent TA instructions?• quadruples: 3 “addresses” + the op• triple possible (if target-address (left-hand side) always a new
temporary)
472 / 672
![Page 473: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/473.jpg)
Quadruple-representation for TAC (in C)
473 / 672
![Page 474: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/474.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
474 / 672
![Page 475: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/475.jpg)
P-code
• different common intermediate code / IR• aka “one-address code”57 or stack-machine code• originally developed for Pascal• remember: post-fix printing of syntax trees (for expressions)and “reverse polish notation”
57There’s also two-address codes, but those have fallen more or less in disuse.475 / 672
![Page 476: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/476.jpg)
Example: expression evaluation 2*a+(b-3)
l d c 2 ; l oad con s t an t 2l od a ; l o ad v a l u e o f v a r i a b l e ampi ; i n t e g e r m u l t i p l i c a t i o nl od b ; l oad v a l u e o f v a r i a b l e bl d c 3 ; l oad con s t an t 3s b i ; i n t e g e r s u b s t r a c t i o nad i ; i n t e g e r a d d i t i o n
476 / 672
![Page 477: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/477.jpg)
P-code for assignments: x := y + 1
• assignments:• variables left and right: L-values and R-values• cf. also the values ↔ references/addresses/pointers
l da x ; l o ad add r e s s o f xl od y ; l o ad v a l u e o f yl d c 1 ; l oad con s t an t 1ad i ; addsto ; s t o r e top to add r e s s
; be low top & pop both
477 / 672
![Page 478: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/478.jpg)
P-code of the faculty function
r ead x ; { i n pu t an i n t e g e r }i f 0<x then
f a c t := 1 ;r e p e a t
f a c t := f a c t ∗ x ;x := x −1
u n t i l x = 0 ;w r i t e f a c t { output : f a c t o r i a l o f x }
end
478 / 672
![Page 479: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/479.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
479 / 672
![Page 480: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/480.jpg)
Expression grammar
Grammar
exp1 → id = exp2exp → aexpaexp → aexp2 + factoraexp → factor
factor → ( exp )
factor → numfactor → id
(x=x+3)+4+
x=
+
x 3
4
480 / 672
![Page 481: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/481.jpg)
Generating p-code with a-grammars
• goal: p-code as attributes of the grammar symbols/nodes ofthe syntax trees
• “syntax-directed translation”• technical task: turn the syntax tree into a linear IR (hereP-code)
⇒ • “linearization” of the syntactic tree structure• while translating the nodes of the tree (the syntactical
sub-expressions) one-by-one
• conceptual picture only, not done line that (with A-grammars)in practice!
• not recommended at any rate (for modern/reasonably complexlanguage): code generation while parsing
481 / 672
![Page 482: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/482.jpg)
A-grammar for the statement/expression
• dealing here with expressions/assignment: leaves out certaincomplications
• in particular: control-flow complications• two-armed conditionals• loops etc.
• but: code-generation “intra-procedural” only, rest is filled in ascall-sequences.
• A-grammar:• rather simple and straightforwad• only 1 synthesized attribute: pcode
482 / 672
![Page 483: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/483.jpg)
A-grammar
• “string” concatenation: ++ and ˆ (inside one command)58
productions/grammar rules semantic rulesexp1 → id = exp2 exp1.pcode = ”lda”ˆid .strval ++
exp2.pcode ++ ”stn”exp → aexp exp.pcode = aexp.pcode
aexp1 → aexp2 + factor aexp1.pcode = aexp2.pcode++ factor .pcode++ ”adi”
aexp → factor aexp.pcode = factor .pcodefactor → ( exp ) factor .pcode = exp.pcodefactor → num factor .pcode = ”ldc”ˆnum.strvalfactor → id factor .pcode = ”lod”ˆnum.strval
58So, the result is not 100% linear. In general, one should not produce a flatstring already.
483 / 672
![Page 484: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/484.jpg)
(x = x + 3) + 4
Attributed tree+
x∶=
+
x 3
4
result
lod x ldc 3
lod xldc 3adi
ldc 4
lda xlod xldc 3adi 3stn
Result
l da xl od xl d c 3ad istnldc 4ad i ; +
• note: here x=x+3 has side effect and “return” value (in C):• stn (“store non-destructively”)
• similar to sto , but non-destructive1. take top element, store it at address represented by 2nd top2. discard addess, but not the top-value
484 / 672
![Page 485: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/485.jpg)
Overview: p-code data structures
t ype symbol = s t r i n g
t ype exp r =| Var o f symbol| Num o f i n t| P lu s o f exp r ∗ exp r| As s i gn o f symbol ∗ exp r
t ype i n s t r =(∗ p−code i n s t r u c t i o n s ∗)
LDC o f i n t| LOD o f symbol| LDA o f symbol| ADI| STN| STO
t ype t r e e = One l i n e o f i n s t r| Seq o f t r e e ∗ t r e e
t ype program = i n s t r l i s t
• symbols:• here strings for simplicity• concretely, symbol table may be involved, or variable names
already resolved in addresses etc.
485 / 672
![Page 486: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/486.jpg)
Two stage translation
v a l to_tree : A s t e x p r a s s i g n . exp r −> Pcode . t r e e
v a l l i n e a r i z e : Pcode . t r e e −> Pcode . program
v a l to_program : A s t e x p r a s s i g n . exp r −> Pcode . program
l e t rec to_tree ( e : exp r ) =match e with| Var s −> ( One l i n e (LOD s ) )| Num n −> ( One l i n e (LDC n ) )| P lu s ( e1 , e2 ) −>
Seq ( to_tree e1 ,Seq ( to_tree e2 , One l i n e ADI ) )
| As s i gn ( x , e ) −>Seq ( One l i n e (LDA x ) ,
Seq ( to_tree e , One l i n e STN) )
l e t rec l i n e a r i z e ( t : t r e e ) : program =match t with
One l i n e i −> [ i ]| Seq ( t1 , t2 ) −> ( l i n e a r i z e t1 ) @ ( l i n e a r i z e t2 ) ; ;
l e t to_program e = l i n e a r i z e ( to_tree e ) ; ;
486 / 672
![Page 487: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/487.jpg)
Source language AST data in C
• remember though: there are more dignified ways to designASTs . . .
487 / 672
![Page 488: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/488.jpg)
Code-generation via tree traversal (schematic)
p r o c e du r e genCode (T: t r e enode )beg i n
i f T /= n i lthen‘ ‘ g en e r a t e code to p r epa r e f o r code f o r l e f t c h i l d ’ ’ // p r e f i xgenCode ( l e f t c h i l d o f T) ; // p r e f i x ops‘ ‘ g en e r a t e code to p r epa r e f o r code f o r r i g h t c h i l d ’ ’ // i n f i xgenCode ( r i g h t c h i l d o f T) ; // i n f i x ops‘ ‘ g en e r a t e code to implement a c t i o n ( s ) f o r T’ ’ // p o s t f i x
end ;
488 / 672
![Page 489: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/489.jpg)
Code generation from AST+
• main “challenge”:linearization
• here: relatively simple• no control-flow constructs• linearization here (seea-grammar):
• string of p-code• not necessarily the best
choice (p-code might stillneed translation to “real”executable code)
preamble code
calc. of operand 1
fix/adapt/prepare ...
calc. of operand 2
execute operation
489 / 672
![Page 490: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/490.jpg)
Code generation
• slightly unstructured (since AST is unstructured)
490 / 672
![Page 491: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/491.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
491 / 672
![Page 492: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/492.jpg)
TAC manual translation again
Source
r ead x ; { i n pu t an i n t e g e r }i f 0<x then
f a c t := 1 ;r e p e a t
f a c t := f a c t ∗ x ;x := x −1
u n t i l x = 0 ;w r i t e f a c t { output : f a c t o r i a l o f x }
end
Target: TAC
r e ad x ; { i n pu t an i n t e g e r }i f 0<x then
f a c t := 1 ;r e p e a t
f a c t := f a c t ∗ x ;x := x −1
u n t i l x = 0 ;w r i t e f a c t { output : f a c t o r i a l o f x }
end
492 / 672
![Page 493: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/493.jpg)
Expression grammar
Grammar
exp1 → id = exp2exp → aexpaexp → aexp2 + factoraexp → factor
factor → ( exp )
factor → numfactor → id
(x=x+3)+4+
x=
+
x 3
4
493 / 672
![Page 494: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/494.jpg)
Three-address code data structures (some)
t ype symbol = s t r i n g
t ype exp r =| Var o f symbol| Num o f i n t| P lu s o f exp r ∗ exp r| As s i gn o f symbol ∗ exp r
t ype mem =Var o f symbol
| Temp o f symbol| Addr o f symbol (∗ &x ∗)
t ype operand = Const o f i n t| Mem o f mem
t ype cond = Bool o f operand| Not o f operand| Eq o f operand ∗ operand| Leq o f operand ∗ operand| Le o f operand ∗ operand
t ype r h s = Plus o f operand ∗ operand| Times o f operand ∗ operand| I d o f operand
t ype i n s t r =Read o f symbol
| Wr i te o f symbol| Lab o f symbol
(∗ pseudo i n s t r u c t i o n ∗)| A s s i gn o f symbol ∗ r h s| As s i gnR I o f operand ∗ operand ∗ operand
(∗ a := b [ i ] ∗)| A s s i g nL I o f operand ∗ operand ∗ operand
(∗ a [ i ] := b ∗)| BranchComp o f cond ∗ l a b e l| Ha l t| Nop
t ype t r e e = One l i n e o f i n s t r| Seq o f t r e e ∗ t r e e
t ype program = i n s t r l i s t
(∗ Branches a r e not so c l e a r . I t ake i n s p i r a t i o n f i r s t from ASU. I t seemstha t Louden has the TAC i f _ f a l s e t goto L . The Dragonbook a l l ow s a c t u a l l ymore complex s t r u c t u r e , namely compar i sons . However , two−armed b ranche s a r enot welcome ( tha t would be a t r e e − IR ) ∗)
(∗ Array a c c e s s : For a r r a y a c c e s s e s l i k e a [ i +1] = b [ j ] e t c . one cou ld adds p e c i a l commands . Louden i n d i c a t e s that , but a l s o i n d i c a t e s t ha t i f onehas i n d i r e c t a d d r e s s i n g and a r i t hm e t i c op e r a t i o n s , one does not needtho se . I n the TAC o f the dragon books , they have such op e r a t i o n s , so Iadd them he re as w e l l .
Of cou r s e one s u r e not a l l ow comp l e t e l y f r e eforms l i k e a [ i +1] = b [ j ] i n TAC, as t h i s i n v o l v e s more than 3a dd r e s s e s . Louden s u gg e s t s two ope r a t o r s , ‘ ‘ [ ]= ’ ’ and ‘ ‘ [ ]= ’ ’ .
We cou ld i n t r o d u c e more complex operands , l i k e a [ i ] but then we woulda l l ow non− t h r e e add r e s s code t h i n g s . We don ’ t do tha t ( o f cour se , thes yn tax i s a l r e a d y s l i g h t l y too l i b e r a l . . . )
∗)
• symbols: again strings for simplicity• again “trees” not really needed (for simple language withoutmore challenging control flow)
494 / 672
![Page 495: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/495.jpg)
Translation to three-address code
l e t rec to_tree ( e : exp r ) : t r e e ∗ temp =match e with
Var s −> ( One l i n e Nop , s )| Num i −> ( One l i n e Nop , s t r i n g_o f_ i n t i )| Ast . P lu s ( e1 , e2 ) −>
(match ( to_tree e1 , to_tree e2 ) with( ( c1 , t1 ) , ( c2 , t2 ) ) −>
l e t t = newtemp ( ) i n( Seq ( Seq ( c1 , c2 ) ,
One l i n e (As s i gn ( t ,
P lu s (Mem(Temp( t1 ) ) ,Mem(Temp( t2 ) ) ) ) ) ) ,t ) )
| Ast . As s i gn ( s ’ , e ’ ) −>l e t ( c , t2 ) = to_tree ( e ’ )i n ( Seq ( c ,
One l i n e ( As s i gn ( s ’ ,I d (Mem(Temp( t2 ) ) ) ) ) ) ,
t2 )
495 / 672
![Page 496: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/496.jpg)
Three-address code by synthesized attributes
• similar to the representation for p-code• again: purely synthesized• executing expressions/ssignments
• side-effect plus also• value
• two attributes (before: only 1)• tacode: instructions (as before, as string), potentially empty• name: “name” of variable or tempary, where result resides59
• evaluation of expressions: left-to-right (as before)
59In the p-code, the result of evaluating expression (also assignments) endsup in the stack (at the top). This, one does not need to capture it in anattribute.
496 / 672
![Page 497: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/497.jpg)
A-grammar
productions/grammar rules semantic rulesexp1 → id = exp2 exp1.name = exp2.name
exp1.tacode = exp2.tacode ++id .strvalˆ”=”ˆexp2.name
exp → aexp exp.name = aexp.nameexp.tacode = aexp.tacode
aexp1 → aexp2 + factor aexp1.name = newtemp()aexp1.tacode = aexp2.tacode ++ factor .tacode ++
aexp1.nameˆ”=”ˆaexp2.nameˆ”+”ˆfactor .name
aexp → factor aexp.name = factor .nameaexp.tacode = factor .tacode
factor → ( exp ) factor .name = exp.namefactor .tacode = exp.tacode
factor → num factor .name = num.strvalfactor .tacode = ””
factor → id factor .name = num.strvalfactor .tacode = ””
497 / 672
![Page 498: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/498.jpg)
Three-address code data structures (some)
t ype symbol = s t r i n g
t ype exp r =| Var o f symbol| Num o f i n t| P lu s o f exp r ∗ exp r| As s i gn o f symbol ∗ exp r
t ype mem =Var o f symbol
| Temp o f symbol| Addr o f symbol (∗ &x ∗)
t ype operand = Const o f i n t| Mem o f mem
t ype cond = Bool o f operand| Not o f operand| Eq o f operand ∗ operand| Leq o f operand ∗ operand| Le o f operand ∗ operand
t ype r h s = Plus o f operand ∗ operand| Times o f operand ∗ operand| I d o f operand
t ype i n s t r =Read o f symbol
| Wr i te o f symbol| Lab o f symbol
(∗ pseudo i n s t r u c t i o n ∗)| A s s i gn o f symbol ∗ r h s| As s i gnR I o f operand ∗ operand ∗ operand
(∗ a := b [ i ] ∗)| A s s i g nL I o f operand ∗ operand ∗ operand
(∗ a [ i ] := b ∗)| BranchComp o f cond ∗ l a b e l| Ha l t| Nop
t ype t r e e = One l i n e o f i n s t r| Seq o f t r e e ∗ t r e e
t ype program = i n s t r l i s t
(∗ Branches a r e not so c l e a r . I t ake i n s p i r a t i o n f i r s t from ASU. I t seemstha t Louden has the TAC i f _ f a l s e t goto L . The Dragonbook a l l ow s a c t u a l l ymore complex s t r u c t u r e , namely compar i sons . However , two−armed b ranche s a r enot welcome ( tha t would be a t r e e − IR ) ∗)
(∗ Array a c c e s s : For a r r a y a c c e s s e s l i k e a [ i +1] = b [ j ] e t c . one cou ld adds p e c i a l commands . Louden i n d i c a t e s that , but a l s o i n d i c a t e s t ha t i f onehas i n d i r e c t a d d r e s s i n g and a r i t hm e t i c op e r a t i o n s , one does not needtho se . I n the TAC o f the dragon books , they have such op e r a t i o n s , so Iadd them he re as w e l l .
Of cou r s e one s u r e not a l l ow comp l e t e l y f r e eforms l i k e a [ i +1] = b [ j ] i n TAC, as t h i s i n v o l v e s more than 3a dd r e s s e s . Louden s u gg e s t s two ope r a t o r s , ‘ ‘ [ ]= ’ ’ and ‘ ‘ [ ]= ’ ’ .
We cou ld i n t r o d u c e more complex operands , l i k e a [ i ] but then we woulda l l ow non− t h r e e add r e s s code t h i n g s . We don ’ t do tha t ( o f cour se , thes yn tax i s a l r e a d y s l i g h t l y too l i b e r a l . . . )
∗)
• symbols: again strings for simplicity• again “trees” not really needed (for simple language withoutmore challenging control flow)
498 / 672
![Page 499: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/499.jpg)
Another sketch of TA-code generation
switch k ind {case OpKind :
switch op {case Plus : {
tempname = new temorary name ;varname_1 = r e c u r s i v e c a l l on l e f t s u b t r e e ;varname_2 = r e c u r s i v e c a l l on r i g h t s u b t r e e ;emit ( "tempname␣=␣varname_1␣+␣varname_2" ) ;re tu rn ( tempname ) ; }
case Ass i gn : {varname = i d . f o r v a r i a b l e on l h s ( i n the node ) ;varname 1 = r e c u r s i v e c a l l i n l e f t s u b t r e e ;emit ( "varname␣=␣opname" ) ;re tu rn ( varname ) ; }
}case ConstKind ; { re tu rn ( cons tant − s t r i n g ) ; } // emit no th i ngcase I dK ind : { re tu rn ( i d e n t i f i e r ) ; } // emit no th i ng
}
• slightly more cleaned up (and less C-details) than in the book• “return” of the two attributes
• name of the variable (a temorary): officially returned• the code: via emit
• note: postfix emission only (in the shown cases) 499 / 672
![Page 500: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/500.jpg)
Generating code as AST methods
• possible: add genCode as method to the nodes of the AST60
• e.g.: define an abstract String genCodeTA() in the Exp class(or Node, in general all AST nodes where needed)
S t r i n g genCodeTA ( ) { S t r i n g s1 , s2 ; S t r i n g t = NewTemp ( ) ;s1 = l e f t . GenCodeTA ( ) ;s2 = r i g h t . GenCodeTA ( ) ;emit ( t + "=" + s1 + op + s2 ) ;re tu rn t
}
60Whether that is a good design from a compiler-perspective and codemaintenance, cluttering the AST with methods for code generation and godknows what else, e.g. type checking, optimization . . . , is a different question.
500 / 672
![Page 501: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/501.jpg)
Translation to three-address code
l e t rec to_tree ( e : exp r ) : t r e e ∗ temp =match e with
Var s −> ( One l i n e Nop , s )| Num i −> ( One l i n e Nop , s t r i n g_o f_ i n t i )| Ast . P lu s ( e1 , e2 ) −>
(match ( to_tree e1 , to_tree e2 ) with( ( c1 , t1 ) , ( c2 , t2 ) ) −>
l e t t = newtemp ( ) i n( Seq ( Seq ( c1 , c2 ) ,
One l i n e (As s i gn ( t ,
P lu s (Mem(Temp( t1 ) ) ,Mem(Temp( t2 ) ) ) ) ) ) ,t ) )
| Ast . As s i gn ( s ’ , e ’ ) −>l e t ( c , t2 ) = to_tree ( e ’ )i n ( Seq ( c ,
One l i n e ( As s i gn ( s ’ ,I d (Mem(Temp( t2 ) ) ) ) ) ) ,
t2 )
501 / 672
![Page 502: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/502.jpg)
Attributed tree (x=x+3) + 4
• note: room for optimization502 / 672
![Page 503: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/503.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
503 / 672
![Page 504: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/504.jpg)
“Static simulation”
• illustrated by transforming P-code → TA-code• very restricted setting: straight-line code• cf. also basic blocks (or elementary blocks)
• code without branching or other control-flow complications(jumps/conditional jumps. . . )
• often considered as basic building block for static/semanticanalyses,
• e.g. basic blocks as nodes in control-flow graphs, the“non-semicolon” control flow result in the edges.
• terminology: static simulation seems not widely established• cf. abstract interpretation, symbolic execution etc.
504 / 672
![Page 505: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/505.jpg)
P-code ⇒ TA-code via “static simulation”
• difference:• p-code operates on the stack• leaves the needed “temporary memory” implicit
• given the (straight-line) p-code:• traverse the code = list of instructions from beginning to end• seen as “simulation”
• conceptually at least, but also• concretely: the translation can make use of an actual stack
505 / 672
![Page 506: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/506.jpg)
From P-code to TA-code: illustration
506 / 672
![Page 507: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/507.jpg)
From TA-code to P-code: macro expansion
• also here: simplification, illustrating the general technique only• main simplification:
• register allocation• but: better done in just another optmization “phase”
Macro for general TAC instruction: a = b + b
l da al od b ; o r ‘ ‘ l d c b ’ ’ i f b i s a con s tl od c : o r ‘ ‘ l d c c ’ ’ i f c i s a con s tad isto
507 / 672
![Page 508: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/508.jpg)
Example: P-code ⇒ TA-code ((x=x+3)+4)
source TA-code
t1 = x + 3x = t2t2 = t1 + 4
Direct P-code
l da xl od xl d c 3ad istnldc 4ad i ; +
P-code via TA-code by macro exp.
;−−− t1 = x + 3l da t1l od xl d c 3ad isto;−−− x = t1l da xl od t1sto;−−− t2 = t1 + 4l da t2l od t1l d c 4ad isto
cf. indirect 13 instructions vs. direct: 7 instructions
508 / 672
![Page 509: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/509.jpg)
TAC via P-code
• as seen: detour lead to sub-optimal results (code size, alsoefficiency,
• basic deficiency: too many temporaries memory traffic etc)• several possibilities
• avoid it altogether, of course (but JIT)• chance for cope optimization phase• more clever macro expansion (sketch only)
the more clever macro expansion: some form of staticsimulation again
• don’t macro-expand the linear TAC (via static simulation)• brainlessly into another linear structure (P-code), but• “statically simulate” it into a more fancy structure (a tree)
509 / 672
![Page 510: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/510.jpg)
“Static simulation” into tree form (sketch only)
• more fancy form of “static simulation”• results: tree labelled with
• operator, together with• variables/tmporaries containing the results
+
+
x 3
4
t2
x,t1
• note: instruction x = t1 from TAC: does not lead to morenodes in the tree
510 / 672
![Page 511: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/511.jpg)
P-code generation from the thus generated tree
Tree from TAC+
+
x 3
4
t2
x,t1
Direct code = indirect code
l da xl od xl d c 3ad istnldc 4ad i ; +
• with the thusly (re-)constructed tree⇒ p-code generation
• as before done for the AST• remember: code as synthesized attributes
• in a way: the “trick” was: reconstruct the essential syntactictree structure (via “static simulation”) from the TA-code
511 / 672
![Page 512: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/512.jpg)
Compare: AST (with direct p-code attributes)
+
x∶=
+
x 3
4
result
lod x ldc 3
lod xldc 3adi
ldc 4
lda xlod xldc 3adi 3stn
512 / 672
![Page 513: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/513.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
513 / 672
![Page 514: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/514.jpg)
Status update: code generation
• so far: a number of simplifications• data types:
• integer constants only• no complex types (arrays, records, references, etc.)
• control flow• only expressions and• sequential composition⇒ straight-line code
514 / 672
![Page 515: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/515.jpg)
Address modes and address calculations
• so far,• just standard “variables” (l-variables and r-variables) and
temporaries, as in x = x +1• variables referred to by there names (symbols)
• but in the end: variables are represented by adresses• more complex address calculations needed
addressing modes in TAC:• &x: address of x (not fortemporaries!)
• *t: indirectly via t
addressing modes in P-code• ind i: indirect load• ixa a: indexed address
515 / 672
![Page 516: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/516.jpg)
Address calculations in TAC: x[10] = 2
• notationally represented as in C• “pointer arithmetic” and address calculation with the availablenumerical ops
t1 = &x + 10∗ t1 = 2
• 3-address-code data structure (e.g., quadrupel): extended
516 / 672
![Page 517: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/517.jpg)
Address calculations in P-code: x[10] = 2
• tailor-made commands for address calculation
• ixa i: integer scale factor
l da xl d c 10i x a 1l d c 2sto
517 / 672
![Page 518: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/518.jpg)
Array references and address calculations
i n t a [ SIZE ] ; i n t i , j ;a [ i +1] = a [ j ∗2 ] + 3 ;
• difference between left-hand use and right-hand use• arrays: stored sequentially, starting at base address• offset, calculated with a scale factor• for example: for a[i+1] (with C-style array implementation)61
a + (i+1) * sizeof(int)
• a here directly stands for the base address
61In C, arrays start at an 0-offset as the first array index is 0. Details maydiffer in other languages.
518 / 672
![Page 519: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/519.jpg)
Array accesses in TA code
• one possible way: assume 2 addition TAC instructions• remember: TAC can be seen as intermediate code, notinstruction set of a particular HW!
• 2 new instructions62
t2 = a [ t1 ] ; f e t c h v a l u e o f a r r a y e l ement
a [ t2 ] = t1 ; a s s i g n to the add r e s s o f an a r r a y e l ement
a [ i +1] = a [ j ∗2 ] + 3 ;
t1 = j ∗ 2t2 = a [ t1 ]t3 = t2 + 3t4 = i + 1a [ t4 ] = t3
62Still in TAC format. Apart from the “readable” notation, it’s just twoop-codes, say =[] and []=.
519 / 672
![Page 520: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/520.jpg)
Array accesses in TA code (2)
Expanding t2=a[t1]
t3 = t1 ∗ e l em_s ize ( a )t4 = &a + t3t2 = ∗ t4
Expanding a[t2]=t1
t3 = t2 ∗ e l em_s ize ( a )t4 = &a + t3∗ t4 = t1
• “expanded” result for a[i+1] = a[j*2] + 3
t1 = j ∗ 2t2 = t1 ∗ e l em_s ize ( a )t3 = &a + t2t4 = ∗ t3t5 = t4 +3t6 = i + 1t7 = t6 ∗ e l em_s ize ( a )t8 = &a + t7∗ t8 = t5
520 / 672
![Page 521: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/521.jpg)
Array accessses in P-code
Expanding t2=a[t1]
l da t2l da al od t1i x a e l ement_s i z e ( a )ind 0sto
Expanding a[t2]=t1
l da al od t2i x a e l em_s ize ( a )l od t1sto
• “expanded” result for a[i+1] = a[j*2] + 3l d a al o d il d c 1a d ii x a e l em_s ize ( a )l d a al o d jl d c 2mpii x a e l em_s ize ( a )i n d 0l d c 3a d is t o
521 / 672
![Page 522: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/522.jpg)
Extending grammar & data structures
• extending the previous grammar
exp → subs = exp2 ∣ aexpaexp → aexp + factor ∣ factor
factor → ( exp ) ∣ num ∣ subssubs → id ∣ id [ exp ]
522 / 672
![Page 523: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/523.jpg)
Syntax tree for (a[i+1]=2)+a[j]
+
=
a[]
+
i 1
2
a[]
j
523 / 672
![Page 524: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/524.jpg)
Code generation for P-code
vo id genCode ( SyntaxTree , i n t i sAddr ) {char c o d e s t r [ CODESIZE ] ;/∗ CODESIZE = max l e n g t h o f 1 l i n e o f P−code ∗/i f ( t != NULL) {
switch ( t−>kind ) {case OpKind :
{ switch ( t−>op ) {case Plus :
i f ( i s Add r e s s ) emitCode ( " E r r o r " ) ; // new checke l s e { // unchanged
genCode ( t−>l c h i l d , FALSE ) ;genCode ( t−>r c h i l d , FALSE ) ;emitCode ( " ad i " ) ; // a d d i t i o n
}break ;
case Ass i gn :genCode ( t−>l c h i l d ,TRUE) ; // ‘ ‘ l − v a l u e ’ ’genCode ( t−>r c h i l d , FALSE ) ; // ‘ ‘ r− v a l u e ’ ’emitCode ( " s tn " ) ;
524 / 672
![Page 525: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/525.jpg)
Code generation for P-code (“subs”)
• new code, of course
case Subs :s p r i n t f ( c o d e s t r i n g , "%s ␣%s " , " l d a " , t−>s t r v a l ) ;emitCode ( c o d e s t r i n g ) ;genCode ( t−> l c h i l d . FALSE ) ;s p r i n t f ( c o d e s t r i n g , "%s ␣%s ␣%s " ,
" i x a ␣ e l em_s ize ( " , t−>s t r v a l , " ) " ) ;emitCode ( c o d e s t r i n g ) ;i f ( ! i sAddr ) emitCode ( " i nd ␣0" ) ; // i n d i r e c t l o adbreak ;
de f au l t :emitCode ( " E r r o r " ) ;break ;
525 / 672
![Page 526: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/526.jpg)
Code generation for P-code (constants and identifiers)
case ConstKind :i f ( i sAddr ) emitCode ( " E r r o r " ) ;e l s e {
s p r i n t f ( code s t r , "%s ␣%s " , " l d s " , t−>s t r v a l ) ;emitCode ( c o d e s t r ) ;
}break ;
case I dK ind :i f ( i sAddr )
s p r i n t f ( code s t r , "%s ␣%s " , " l d a " , t−>s t r v a l ) ;e l s e
s p r i n t f ( code s t r , "%s ␣%s " , " l od " , t−>s t r v a l ) ;emitCode ( c o d e s t r ) ;break ;
de f au l t :emitCode ( " E r r o r " ) ;break ;
}}
}
526 / 672
![Page 527: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/527.jpg)
Access to records
typedef s t r u c t r e c {i n t i ;char c ;i n t j ;
} Rec ;. . .
Rec x ;
• fields with (statically known) offsets from base address• note:
• goal is: intermediate code generation platform independent• another way of seeing it: it’s still IR, not final machine code
yet.• thus: introduce function field_offset(x,j)• calculates the offset.• can be looked up (by the code-generator) in the symbol table⇒ call replaced by actual off-set
527 / 672
![Page 528: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/528.jpg)
Records/structs in TAC
• note: typically, records are implicitly references (as for objects)• in (our version of a) TAC: we can just use &x and *x
simple record access x.j
t1 = &x + f i e l d _ o f f s e t ( x , j )
left and right: x.j = x.i
t1 = &x + f i e l d _ o f f s e t ( x , j )t2 = &x + f i e l d _ o f f s e t ( x , j )∗ t1 = ∗ t2
528 / 672
![Page 529: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/529.jpg)
Field selection and pointer indirection in TAC
typedef s t r u c t t reeNode {i n t v a l ;s t r u c t t reeNode ∗ l c h i l d ,
∗ r c h i l d ;} t reeNode. . .
Treenode ∗p ;
assigments involving fields:
p −> l c h i l d = p ;p = p−>r c h i l d ;
t1 = p + f i e l d_a c c e s s (∗p , l c h i l d )∗ t1 = pt2 = p + f i e l d_a c c e s s (∗p , r c h i l d )p = ∗ t2
529 / 672
![Page 530: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/530.jpg)
Structs and pointers in P-code
• basically same basic “trick”• make use of field_offset(x,j)
p −> l c h i l d = p ;p = p−>r c h i l d ;
l od pl d c f i e l d _ o f f s e t (∗p , l c h i l d )i x a 1l od pstolda pl od pind f i e l d _ o f f s e t (∗p , r c h i l d )sto
530 / 672
![Page 531: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/531.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
531 / 672
![Page 532: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/532.jpg)
Control statements
• so far: basically straight-line code• intra-procedural63 control more complex thanks tocontrol-statements
• conditionals, switch/case• loops (while, repeat, for . . . )• breaks, gotos64, exceptions . . .
important “technical” device: labels• symbolic representation of addresses in static memory• specifically named (= labelled) control flow points• nodes in the control flow graph
• generation of labels (cf. temporaries)63“inside” a procedure. inter-procedural control-flow refers to calls and
returns, which is handled by calling sequences (which also maintain (in thestandard C-like language) the call-stack/RTE
64gotos are almost trivial in code generation (as they are basically availableat machine code level. Nonetheless, they are “considered harmful”, as theymess up everything else in a compiler/language.
532 / 672
![Page 533: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/533.jpg)
Loops and conditionals: linear code arrangement
if -stmt → if ( exp ) stmt else stmtwhile-stmt → while ( exp ) stmt
• challenge:• high-level syntax (AST) well-structured (= tree) which
implicitly (via its structure) determines complex control-flowbeyond SLC
• low-level syntax (TAC/P-code): rather flat, linear structure,ultimately just a sequence of commands
533 / 672
![Page 534: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/534.jpg)
Arrangement of code blocks and cond. jumps
534 / 672
![Page 535: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/535.jpg)
Jumps and labels: conditionals
if (E) then S1 else S2
TAC for conditional
<code to e v a l E1 to t1>i f_ f a l s e t1 goto L1<code f o r S1>goto L2l a b e l L1<code f o r S2>l a b e l L2
P-code for conditional
<code to e v a l u a t e E>f j p L1<code f o r S1>ujp L2l ab L1<code f o r S2>l ab L3
• 3 new op-codes:• ujp: unconditional jump
(“goto”)• fjp: jump on false• lab: label (for pseudo
instructions)535 / 672
![Page 536: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/536.jpg)
Jumps and labels: while
while (E) S
TAC for while
l a b e l L2<code to e v a l u a t e E to t1>i f_ f a l s e t1 goto L2<code f o r S>goto L1l a b e l L2
P-code for while
l ab L1<code to e v a l u a t e E>f j p L2<code f o r S>ujp L1l ab L2
536 / 672
![Page 537: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/537.jpg)
Boolean expressions
• two alternatives for treatment1. as ordinary expressions2. via short-circuiting
• ultimate representation in HW:• no built-in booleans (HW is generally untyped)• but “arithmetic” 0, 1 work equivalently & fast• bitwise ops which corresponds to logical ∧ and ∨ etc
• comparison on “booleans”: 0 < 1?• boolean values vs. jump conditions
537 / 672
![Page 538: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/538.jpg)
Short circuiting boolean expressions
i f ( ( p!=NULL) && p −> v a l l ==0)) . . .
• done in C, for example• semantics must fix evaluation order• note: logically equivalent a ∧ b = b ∧ a
• cf. to conditionalexpressions/statements (alsoleft-to-right)
a and b ≜ if a then b else falsea or b ≜ if a then true else b
l od xl d c 0neq ; x !=0 ?f j p L1; jump , i f x=0l od yl od xequ ; x =? yujp L2 ; hop ove rl ab L1l d c FALSEl ab L2
• new op-codes• equ• neq
538 / 672
![Page 539: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/539.jpg)
Grammar for loops and conditional
stmt → if -stmt ∣ while-stmt ∣ break ∣ otherif -stmt → if ( exp ) stmt else stmt
while-stmt → while ( exp ) stmtexp → true ∣ false
• note: simplistic expressions, only true and false
typedef enum {ExpKind , I f k i n d , Whi lek ind ,BreakKind , OtherKind } NodeKind ;
typedef s t r u c t s t r e e nod e {NodeKind k ind ;s t r u c t s t r e e nod e ∗ c h i l d [ 3 ] ;i n t v a l ; /∗ used wi th ExpKind ∗/
/∗ used f o r t r u e vs . f a l s e ∗/} STreeNode ;
type StreeNode ∗ SyntaxTree ;
539 / 672
![Page 540: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/540.jpg)
Translation to P-code
i f ( t r u e ) whi le ( t r u e ) i f ( f a l s e ) break e l s e o th e r
l d c t r u ef j p L1l ab L2l d c t r u ef j p L3l d c f a l s ef j p L4ujp L3ujp L5l ab L4Otherl ab L5ujp L2l ab L3l ab L1
540 / 672
![Page 541: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/541.jpg)
Code generation
• extend/adapt genCode• break statement:
• absolute jump to place afterwards• new argument: label to jump-to when hitting a break
• assume: label generator genLabel()• case for if-then-else
• has to deal with one-armed if-then as well: test for NULL-ness
• side remark: control-flow graph (see also later)• labels can (also) be seen as nodes in the control-flow graph• genCode generates labels while traversing the AST⇒ implict generation of the CFG• also possible:
• separate the CFG first• as (just another) IR• generate code from there
541 / 672
![Page 542: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/542.jpg)
Code generation procedure for P-code
542 / 672
![Page 543: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/543.jpg)
Code generation (1)
543 / 672
![Page 544: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/544.jpg)
Code generation (2)
544 / 672
![Page 545: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/545.jpg)
More on short-circuiting (now in 3AC)
• boolean expressions contain only two (official) values: true andfalse
• as stated: boolean expressions are often treated special: viashort-circuiting
• short-cicruiting especially for boolean expressions inconditionals and while-loops and similar
• short-circuiting: specified in the language definition (or not)
545 / 672
![Page 546: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/546.jpg)
Example for short-circuiting
Source
i f a < b | |( c > d && e >= f )
thenx = 8
e l s ey = 5
end i f
TAC
t1 = a < bi f_true t1 goto 1 // s h o r t c i r c u i tt2 = c > di f_ f a l s e goto 2 // s h o r t c i r c u i tt3 = e >= fi f_ f a l s e t3 goto 2l a b e l 1x = 8goto 3l a b e l 2y=5l a b e l 3
546 / 672
![Page 547: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/547.jpg)
Code generation: conditionals (as seen)
547 / 672
![Page 548: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/548.jpg)
Alternative P/TA-Code generation for conditionals
• Assume: no break in the language
548 / 672
![Page 549: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/549.jpg)
Alternative TA-Code generation for boolean expressions
549 / 672
![Page 550: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/550.jpg)
INF5110 – Compiler Construction
Code generation
2. 02. 2016
550 / 672
![Page 551: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/551.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
551 / 672
![Page 552: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/552.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
552 / 672
![Page 553: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/553.jpg)
Code generationa
aThis section is based on slides from Stein Krogdahl, 2015.
• note: code generation so far: AST+ to intermediate code• three address code• P-code
• ⇒ intermediate code generation• i.e., we are still not there . . .• material here: based on the (old) dragon book [?]• there is also a new edition [?]
553 / 672
![Page 554: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/554.jpg)
Intro: code generation
• goal: translate TA-code (=3A-code) to machine language• machine language/assembler:
• even more restricted• 2 address code
• limited number of registers• different address modes with different costs (registers vs. mainmemory)
Goals• efficent codea
• small code size also desirable• but first of all: correct codeaWhen not said otherwise: efficiency refers in the following to efficiency of
the generated code. Fastness of compilation may be important as well (andsame for the size of the compiler itself, as opposed to the size of the generatedcode). Obviously, there are trade-offs to be made.
554 / 672
![Page 555: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/555.jpg)
Code “optimization”
• often conflicting goals• code generation: prime arena for achieving efficiency• optimal code: undecidable anyhow,• even for many more clearly defined subproblems: untractable• “optimization”: interpreted as: heuristics to achieve “goodcode” (without hope for optimal code)
• due to importance of optimization at code generation• time to bring out the “heavy artillery”• so far: all techniques (parsing, lexing, even type checking) are
computationally “easy”• at code generation/optmization: perhaps invest in agressive,
computationally complex and rather advanced techniques• many different techniques used
555 / 672
![Page 556: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/556.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
556 / 672
![Page 557: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/557.jpg)
2-address machine code used here
• “typical” op-codes, but not a instruction set of a concretemachine
• two address instructions• Note: cf. 3-address-code interpmediate representation vs.2-address machine code
• machine code is not lower-level/closer to HW because it hasone argument less than 3AC
• it’s just one illustrative choice• the new dragon book: uses 3-address-machine code (being
more modern)
• 2 address machine code: closer to CISC architectures,• RICS architectures rather use 3AC.• translation task from IR to 3AC or 2AC: comparable challenge
557 / 672
![Page 558: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/558.jpg)
2-address instructions format
FormatOP source dest
• note: order of arguments65
• restriction on source and target• register or memory cell• source: can additionally be a constant
ADD a b // b := a + bSUB a b // b := b − aMUL a b // b := b + aGOTO i // u n c o n d i t i o n a l jump
• further opcodes for conditional jumps, procedure calls . . . .
65In the 2A machine code in Louden, for instance in page 12 or theintroductory slides, the order is the opposite!
558 / 672
![Page 559: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/559.jpg)
Side remark: 3A machine code
Possible format
OP sou r c e1 sou r c e2 de s t
• but then: what’s the difference to 3A intermediate code?• apart from a more restricted instruction set:• restriction on the operands, for example:
• only one of the arguments allowed to be a memory cell• no fancy addressing modes (indirect, indexed . . . see later) for
memory cell, only for registers• not “too much” memory-register traffic back and forth permachine instruction
• example:
&x = &y + *z
may be 3A-intermediate code, but not 3A-machine code559 / 672
![Page 560: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/560.jpg)
“Cost model”
• “optimization”: need some well-defined “measure” of the“quality” of the produced code
• interested here in execution time• not all instructions take the same time• estimation of execution• cost factors:
• size of the instruction• it’s here not about code size, but• instructions need to be loaded• longer instructions => perhaps longer load
• address modes (as additional costs: see later)• registers vs. main memory vs. constants• direct vs. indirect, or indexed access
• factor outside our control/not part of the cost model: effect ofcaching
560 / 672
![Page 561: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/561.jpg)
Instruction modes and additional costs
Mode Form Address Added costabsolute M M 1register R R 0indexed c(R) c + cont(R) 1
indirect register *R cont(R) 0indirect indexed *c(R) cont(c + cont(R)) 1
literal #M the value M 1 only for source
• indirect: useful for elements in “records” with known off.set• indexed: useful for slots in arrays
561 / 672
![Page 562: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/562.jpg)
Examples a := b + c
Using registers
MOV b , R0 // R0 = bADD c , R0// R0 = c + R0MOV R0 , a // a = R0
co s t = 6
Memory-memory ops
MOV b , a // a = bADD c , a // a = c + a
co s t = 6
Data already in registers
MOV ∗R1 , ∗R0 // ∗R0 = ∗R1ADD ∗R2 , ∗R1// ∗R1 = ∗R2 + ∗R1
co s t = 2
Assume R0, R1, and R2 containaddresses for a, b, and c
Storing back to memory
ADD R2 , R1// R1 = R2 + R1MOV R1 , a // a = R1
co s t = 3
Assume R1 and R2 contain valuesfor b, and c
562 / 672
![Page 563: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/563.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
563 / 672
![Page 564: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/564.jpg)
Control-flow graphs
CFGbasically: graph with
• nodes = basic blocks• edges = (potential) jumps (and “fall-throughs”)
• here (as often): CFG on 3AC (linear intermediate code)• also possible CFG on intermediate code,• or even:
• CFG extracted from AST• here: the opposite: synthesizing a CFG from the linear code
• explicit data structure (as another intermediate representation)or implicit only.
564 / 672
![Page 565: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/565.jpg)
From 3AC to CFG: “partitioning algo”
• remember: 3AC contains labels and (conditional) jumps⇒ algo rather straightforward• the only complication: some labels can be ignored• we ignore procedure/method calls here• concept “leader” representing the nodes/basic blocks
Leader• first line is a leader• GOTO i : line labelled i is a leader• instruction after as GOTO is a leader
Basic blockinstruction sequence from (and including) one leader to (butexcluding) the next leader or to the end of code
565 / 672
![Page 566: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/566.jpg)
Partitioning algo
• note: no line jumps to L2
566 / 672
![Page 567: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/567.jpg)
3AC for faculty
read xt1 = x > 0i f_ f a l s e t1 goto L1f a c t = 1l a b e l = L2t2 = f a c t ∗ xf a c t = t2t3 = x −1x = t3t4 = x == 0i f_ f a l s e t4 goto L2wr i te f a c tl a b e l L1ha l t
567 / 672
![Page 568: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/568.jpg)
Faculty: CFG
• goto/conditionalgoto: never insideblock
• not every block• ends in a goto• starts with a label
• ignored here:function/method calls
• intra-proceduralcontrol-flow graph
568 / 672
![Page 569: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/569.jpg)
Levels of analysis
• here: three levels where to apply code analysis / optimization1. local: per basic block (block-level)2. global: per function body/intra-procedural CFG66
3. inter-procedural: really global, whole-program analysis
• the “more global”, the more costly the analysis and, especiallythe optimization (if done at all)
66the terminology “global” is not ideal, for instance process-level analysis(called global here) can be seen as procedure-local, as opposed tointer-procedural analysis which then is global.
569 / 672
![Page 570: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/570.jpg)
Loops in CFGs
• loop optimization: “loops” are thankful places for optimizations• important for analysis to detect loops (in the cfg)• importance of loop discovery: not too important any longer inmodern languages.
Loops in a CFG vs. graph cycles• concept of loops in CFGs not identical with cycles in a grapha
• all loops are graph cycles but not vice versaaOtherwise: cycle/loop detection not worth much discussion
• intuitively: loops are cycles orinating from source-level loopingconstructs (“while”)
• goto’s may lead to non-loop cycles in the CFG• importants of loops: loops are “well-behaved” whenconsidering certain optimizations/code transformations (goto’scan destroy that. . . )
570 / 672
![Page 571: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/571.jpg)
Loops in CFGs: definition
• remember: strongly connected components
Definition (Loop)A loop L in a CFG is a collection of nodes s.t.:
• strongly connected component (with edges completely in L
• 1 (unique) entry node of L, i.e. no node in L has an incomingedgea from outside the loop except the entry
aalternatively: general reachability
• often also: “root” node of a CFG (there’s only one) is not itselfan entry of a loop
571 / 672
![Page 572: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/572.jpg)
Loop
B0
B1
B2 B3
B4
B5
• Loops:• {B3,B4}• {B4,B3,B1,B5,B2}
• Non-loop:• {B1,B2,B5}
• unique entry marked red
572 / 672
![Page 573: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/573.jpg)
Loops as fertile ground for optimizations
whi le ( i < n ) { i ++; A[ i ] = 3∗k }
• possible optimizations• move 3*k “out” of the loop• put frequently used variables into registers while in the loop
(like i)
• when moving out computation from the look:• put “right in front of the loop”⇒ add extra node/basic block in front of the entry of the loop67
67that’s one of pragmatic motivation for unique entry.573 / 672
![Page 574: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/574.jpg)
Loop non-examples
574 / 672
![Page 575: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/575.jpg)
Data flow analysis in general
• general analysis technique working on CFGs• many concrete forms of analyses• such analyses: basis for (many) optimizations• data: info stored in memory/temporaries/registers etc.• control:
• movement of the instruction pointer• abstractly represented by the CFG
• inside elementary blocks: increment of the IS• edges of the CFG: (conditional) jumps• jumps together with RTE and calling convention
Data flowing from (a) to (b)Given the control flow (normally as CFG): is it possible (or is itguaranteed) that some “data” originating at one control-flow point(a) reach another control flow point (b).
575 / 672
![Page 576: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/576.jpg)
Data flow as abstraction
• data flow analysis: fundamental and important static analysistechnique
• it’s impossible to decide statically if data from (a) actually“flows to” (b)
⇒ approximative• therefore: work on the CFG: if there is two options/outgoingedges: consider both
• Data-flow answers therefore approximatively• if it’s possible that the data flows from (a) to (b)• it’s neccessary or unavoidable that data flows from (a) to (b)
• for basic blocks: exact answers possible68
68static simulation here was done for basic blocks only and for the purpose oftranslation. The translation of course needs to be exact, non-approximative.Symbolic evaluation also exist (also for other purposes) in more general forms,especially also working approximatively on abstractions.
576 / 672
![Page 577: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/577.jpg)
Data flow analysis: Liveness
• prototypical / important data flow analysis• especially important for register allocation
Basic questionWhen (at which control-flow point) can I be sure that I don’t needa specific variable (temporary, register) any more?
• optimization: if sure that not needed in the future: registercan be used otherwise
Definition (Live)a variable is live at a given control-flow point if there exists anexecution starting from there, where the variable is used in thefuture.a
aThat corresponds to static liveness (the notion the static liveness analysisdeals with. A variable in a given concrete execution of a program is dynamicallylive if in the future, it is still needed (or, for non-deterministic programs: if thereexists a future, where it’s still used. Dynamic liveness is undecidable, obviously.
577 / 672
![Page 578: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/578.jpg)
Definitions and uses of variables
• when talking about “varaibles”: also temporary variables aremeant.
• basic notions underlying most data-flow analyses (includingliveness analysis)
• here: def’s and uses of variables (or temporaries etc)• all data, including intermediate results) has to be storedsomewhere, in variables, temporaries, etc.
• a “definition” of x = assignment to x (store to x)• a “use” of x : read content of x (load x)• variables can occur more than once, so• a definition/use refers to instances or occurrences of variables(“use of x in line l ” or “use of x in block b ”)
• same for liveness: “x is live here, but not there”
578 / 672
![Page 579: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/579.jpg)
Defs, uses, and liveness
0: x = v + w
. . .
2: a = x + c
3: x =u + v4: x = w
5: d = x + y
• x is “defined” (= assignedto) in 0 and 4
• x is live “in” (= at the endof) block 2, as it may beused in 5
• a non-live variable at somepoint: “dead” which means:the corresponding memorycan be reclaimed.
• note: here liveness acrossblock-boundaries = “global”(but blocks contain only oneinstruction here)
579 / 672
![Page 580: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/580.jpg)
Def-use or use-def analysis
• use-def: given a “use”: determine all possible “definitions”69
• def-use: given a “def”: determine all possible “uses”• for straight-line-code/inside one basic block
• deterministic: each line has has exactly one place where agiven variable has been assigned to last (or else not assigned toin the block). Equivalently for uses.
• for whole CFG:• approximative (“may be used in the future”)• more advanced techiques (caused by presence of loops/cycles)
• def-use analysis:• closely connected to liveness analysis (basically the same)• prototypical data-flow question (same for use-def analysis),
related to many data-flow analyses (but not all)• Side remark: Static single-assignment (SSA) format:
• at most one assignment per variable.• “definition” (place of assignment) for each variable thus clear
from its name69remember: “defs” and “uses” refer to instances of definitions/assignments
in the graph580 / 672
![Page 581: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/581.jpg)
Calculation of def/uses (or liveness . . . )
• three levels of complication1. inside basic block2. branching (but no loops)3. Loops4. [even more complex: inter-procedural analysis]
For SLC/inside basic block• determnistic result• simple “one-pass” treatmentenough
• similar to “static simulation”
For whole CFG• iterative algo needed• dealing withnon-determinism:over-approximation
• “closure” algorithms, similarto the way e.g., dealing withfirst and follow sets
• = fix-point algorithms
581 / 672
![Page 582: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/582.jpg)
Inside one block: optimizing use of temporaries
• simple setting: intra-block analysis & optimization, only• temporaries:
• symbolic representations to hold intermediate results• generated on request, assuming unbounded numbers• intentions: use registers
• limited about of register available (platform dependent)
Assumption• temp’s don’t transfer data accross blocks (/= program var’s)⇒ temp’s dead at the beginning and at the end of a block
• but: variables have to be considered live at the end of a block(block-local analysis, only)
582 / 672
![Page 583: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/583.jpg)
Intra-block liveness
t1 := a − bt2 := t1 ∗ aa := t1 ∗ t2t1 := t1 − ca := t1 ∗ a
• neither temp’s nor vars: “singleassignment”,
• but first occurrence of a temp in ablock: a definition (but for temps itwould often be the case)
• let’s call operand: variables or temp’s• next use of an operand:• uses of operands: on the lhs’s,definitions on the rhs’s
• not good enough to say “t1 is live inline 4”
Note: the TAIC may allow also literal constants as operatorarguments, they don’t play a role right now.
583 / 672
![Page 584: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/584.jpg)
DAG of the block
∗
∗ −
∗
−
a0 b0 c0
a
a t1
t2
t1
• no linear order (as in code),only partial order
• the next use: meaningless• but: all “next” uses visible (ifany)
• node = occurrences of avariable on the rhs(“definitions”)
• e.g.: the “lower node” for“defining”assigning to t1 has/three uses
584 / 672
![Page 585: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/585.jpg)
DAG / SA
∗
∗ −
∗
−
a0 b0 c0
a2
a1 t11
t02
t01
585 / 672
![Page 586: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/586.jpg)
Intra-block liveness: idea of algo
• liveness-status of an operand: differentfrom lhs vs. rhs in a given instructiona
• informal definition: an operand is liveat some occurrence, if it’s used someplace in the future
abut if one occurrence of (say) x in a rhs x + x is live, so is the otheroccurrence.
Definition (consider statement x1 ∶= x2 op x3)• A variable x is live at the beginning of x1 ∶= x2 op x3, if
1. if x is the same variable than x2 or x3, or2. if x live at its end, provided x and x1 are different variables
• A variable x is live at the end of an instruction,• if it’s live at beginning of the next instruction• if no next instruction
• temp’s are dead• user-leval variables are (assumed) live
586 / 672
![Page 587: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/587.jpg)
Liveness
Previous “inductive” definitionexpresses liveness status of variables before a statement dependenton the liveness status of variables after a statement (and thevariables used in the statement)
• core of a straightforward iterative algo• simple backward scan70
• the algo we sketch:• not just boolean info (live = yes/no), instead:• operand live?
• yes, and with next use inside is block (and indicate instructionwhere)
• yes, but with no use inside this block• not live:
• even more info: not just that but indicate, where’s the next use70Remember: intra-block/SLC. In the presence of loops/analysing a
complete CFG, a simple 1-pass does not suffice. More advanced techniques(“multiple-scans” = fixpoint calculations) are needed then.
587 / 672
![Page 588: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/588.jpg)
Algo: dead or alive (binary info only)
// −−−−− i n i t i a l i s e T −−−−−−−−−−−−−−−−−−−−−−−−−−−−
f o r a l l e n t r i e s : T[ i , x ] := Dexcep t : f o r a l l v a r i a b l e s a // but not temps
T[ n , a ] := L ,//−−−−−−− backward pas s −−−−−−−−−−−−−−−−−−−−−−−−−−−−
f o r i n s t r u c t i o n i = n−1 down to 0l e t c u r r e n t i n s t r u c t i o n i : x ∶= y op z ;
T[ i . y ] := LT[ i . z ] := LT[ i . x ] := D // note o r d e r ; x can ‘ ‘ equa l ’ ’ y o r z
end
• Data structure T : table, mapping for each line/instruction iand variable: boolean status of “live”/“dead”
• represents liveness status per variable at the end (i.e. rhs) ofthat line
• basic block: n instructions, from 1 until n, where “line 0”represents the sentry imaginary line “before” the first line (noinstruction in line 0)
• backward scan through instructions/lines from n to 0588 / 672
![Page 589: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/589.jpg)
Algo′: dead or else: alive with next use
• More refined information• not just binary “dead-or-alive” but next-use info⇒ three kinds of information
1. Dead: D2. Live:
• with local line number of next use: L(n)• potential use of outside local basic block L(�)
• otherwise: basically the same algo
// −−−−− i n i t i a l i s e T −−−−−−−−−−−−−−−−−−−−−−−−−−−−
f o r a l l e n t r i e s : T[ i , x ] := Dexcep t : f o r a l l v a r i a b l e s a // but not temps
T[ n , a ] := L(�) ,//−−−−−−− backward pas s −−−−−−−−−−−−−−−−−−−−−−−−−−−−
f o r i n s t r u c t i o n i = n−1 down to 0l e t c u r r e n t i n s t r u c t i o n i : x ∶= y op z ;
T[ i , y ] := L(i + 1)T[ i , z ] := L(i + 1)T[ i , x ] := D // note o r d e r ; x can ‘ ‘ equa l ’ ’ y o r z
end
589 / 672
![Page 590: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/590.jpg)
Run of the algo′
line a b c t1 t2[0] L(1) L(1) L(4) L(2) D1 L(2) L(�) L(4) L(2) D2 D L(�) L(4) L(3) L(3)3 L(5) L(�) L(4) L(4) D4 L(5) L(�) L(�) L(5) D5 L(�) L(�) L(�) D D
t1 := a − bt2 := t1 ∗ aa := t1 ∗ t2t1 := t1 − ca := t1 ∗ a
590 / 672
![Page 591: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/591.jpg)
Liveness algo remarks
• here: T data structure traces (L/D) status per variable × “line”• in the remarks in the notat:
• alternatively: store liveness-status per variable only• works as well for one-pass analyses (but only without loops)
• this version here: corresponds better to global analysis: 1 linecan be seen as one small basic block
591 / 672
![Page 592: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/592.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
592 / 672
![Page 593: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/593.jpg)
Simple code generation algo
• simple algo: intra-block code generation• core problem: register use• register allocation & assignment 71
• hold calculated values in registers longest possibe• intra-block only ⇒ at exit:
• all variables stored back to main memory• all temps assumed “lost”
• remember: assumptions in the intra-block liveness analysis
71some distinguish register allocation: “should the data be held in register(and how long)” vs. register assignment: “which of available register to use forthat”
593 / 672
![Page 594: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/594.jpg)
Limitations of the code generation
• local intra block:• no analysis across blocks• no procedure calls etc
• no complex data structures• arrays• pointers• . . .
some limitations on how the algo itself works for one block• read-only variables: never put in registers, even if variable isrepeatedly read
• algo works only with the temps/variables given and does notcome up with new ones
• for instance: DAGs could help• no semantics considered
• like commutativity: a + b equals b + a
594 / 672
![Page 595: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/595.jpg)
Purpose and “signature” of the getreg function
• one core of the code generation algo• simple code-generation here ⇒ simple getreg
getreg functionavailable: liveness/next-use info
Input: TAIC-instruction x ∶= y op z
Output: return location where x is to be stored
• location: register (if possible) or memory location
595 / 672
![Page 596: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/596.jpg)
Coge generation invariant
it should go without saying . . . :
Basic safetey invariantAt each point, “live” variables (with or without next use in thecurrent block) must exist in at least one location
• another (specific) invariant: the location returned by getreg:the one where the rhs of a 3AIC assignment ends up
596 / 672
![Page 597: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/597.jpg)
Register and address-descriptors
• code generation/getreg : keep track of1. register contents2. addresses for names
Register descriptor• tracking current content ofreg’s (if any)
• consulted when new regneeded
• as said: at block entry,assume all regs unused
Address descriptor• tracking location(s) wherecurrent value of name canbe found
• possible locations: register,stack location, main memory
• > 1 location possible (butnot due tooverapproximation, exacttracking)
597 / 672
![Page 598: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/598.jpg)
Code generation algo for x ∶= y op z
1. determine location (preferably register) for result
L = ge t r e g ( ‘ ‘ x := y op z ’ ’ )
2. make sure, that the value of y is in L :• consult address descriptor for y ⇒ current locations y ′ for y• choose the best location y ′ from those (register)• if value of y not in L, generate
MOV y ’ , L
3. generate
OP z ’ , L // z ’ : a c u r r e n t l o c a t i o n o f z ( p r e f e r reg ’ s )
• update address descriptor x ↦ L• if L is a reg: update reg descriptor L↦ x
4. exploit liveness/next use info: update register descriptors
598 / 672
![Page 599: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/599.jpg)
Skeleton code generation algo for x ∶= y op z
l = getreg ( ‘ ‘ x := y op z ’ ’ ) // t a r g e t l o c a t i o n f o r xi f l ∉ Ta(y) then l e t ly ∈ Ta(y)) i n emit ("MOV ly , l " ) ;l e t z ′ ∈ Ta(z)) i n emit ("OP z ′, l " ) ;
• “skeleton”• non-deterministic: we igored how to choose z ′ and y ′
• we ignore book-keeping in the name and address descriptortables
• details of getreg hidden.
599 / 672
![Page 600: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/600.jpg)
Non-deterministic code generation algo for x ∶= y op z
L = getreg ( ‘ ‘ x := y op z ’ ’ ) // g en e r a t e t a r g e t l o c a t i o n f o r xi f L ∉ Ta(y)then l e t y ′ ∈ Ta(y)) // p i c k a l o c a t i o n f o r y
i n emit (MOV y ’ , L )e l s e sk i p ;l e t z ′ ∈ Ta(z)) i n emit ( ‘ ‘OP z ′ , L ’ ’ ) ;Ta ∶= Ta[x ↦ L] ;i f L i s a r e g i s t e rthen Tr ∶= Tr [L↦ x]
600 / 672
![Page 601: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/601.jpg)
Code generation algo for x ∶= y op z
l = getreg (" i : x := y op z ") // i f o r i n s t r u c t i o n s l i n e number/ l a b e li f l ∉ Ta(y)then l e t ly = best (Ta(y))
i n emit ("MOV ly , l ")e l s e sk i p ;l e t lz = best (Ta(z))i n emit ("OP lz , l " ) ;Ta ∶= Ta/(_↦ l) ;Ta ∶= Ta[x ↦ l] ;Tr ∶= Tr [l ↦ x] ;
i f ¬Tlive[i , y] and Ta(y) = r then Tr ∶= Tr /(r ↦ y)i f ¬Tlive[i , z] and Ta(z) = r then Tr ∶= Tr /(r ↦ z)
601 / 672
![Page 602: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/602.jpg)
Code generation algo for x ∶= y op z (Notat)
l = getreg (" x := y op z ")i f l ∉ Ta(y)then l e t ly = best (Ta(y))
i n emit ("MOV ly , l ")e l s e sk i p ;l e t lz = best (Ta(z))i n emit ("OP lz , l " ) ;
Ta ∶= Ta/(_↦ l) ;Ta ∶= Ta[x ↦ l] ;Tr ∶= Tr [l ↦ x]
602 / 672
![Page 603: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/603.jpg)
Exploit liveness/next use info: recycling registers
• register descriptors: don’t update themselves during codegeneration
• once set (for instance as R0 ↦ t), the info stays, unless reset• thus in step 4 for z ∶= x op y :
to exploit liveness info by recycling reg’sif y and/or z are currently
• not live and are• in registers,
⇒ “wipe” the corresponding info from the corresponding registerdescriptors
• side remark: for address descriptor• no such “wipe” neeed, because it won’t make a difference (y
and/or z are not-live anyhow)• their address descriptor wont’ be consulted further in the block
603 / 672
![Page 604: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/604.jpg)
getreg algo: x ∶= y op z
• goal: return a location for x• basically: check possibilities of register uses,• start with the cheapest option
do the following steps, in that order1. in place if x is in a register already (and if that’s fine
otherwise), then return the register.2. new register if there’s an unsused register: return that3. purge filled register choose more or less cleverly a filled register
and save its content, if needed, and return thatregister
4. use main memory if all else fails
604 / 672
![Page 605: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/605.jpg)
getreg algo: x ∶= y op z in more details
1. if• y in register R• R holds no alternative names• y is not live and has no next use after the 3AIC instruction• ⇒ return R
2. else: if there is an empty register R ′: return R ′
3. else: if• x has a next use [or operator requires a register] ⇒
• find an occupied register R• store R into M if needed (MOV R, M))• don’t forget to update M ’s address descriptor, if needed• return R
4. else: x not used in the block or no suituable occupied registercan be found
• return x as location L
• choice of purged register: heuristics• remember (for step 3): registers may contain value for > 1variable ⇒ multiple MOV’s
605 / 672
![Page 606: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/606.jpg)
Sample TAIC
d := (a-b) + (a-c) + (a-c)
t := a − bu := a − cv := t + ud := v + u
line a b c d t u v
[0] L(1) L(1) L(2) D D D D1 L(2) L(�) L(2) D L(3) D D2 L(�) L(�) L(�) D L(3) L(3) D3 L(�) L(�) L(�) D D L(4) L(4)4 L(�) L(�) L(�) L(�) D D D
606 / 672
![Page 607: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/607.jpg)
Code sequence
3AIC 2AC reg. descr. addr. descriptorR0 R1 a b c d t u v
[0] � � a b c d t u v1 t := a − b MOV a, R0 [a] [R0]
SUB b, R0 t ��R0 R02 u := a − c MOV a, R1 ⋅ [a] [R0]
SUB c, R1 u ��R0 R13 v := t + u ADD R1, R0 v ⋅ ��R0 R04 d := v + u ADD R1, R0 d R0 ��R0
MOV R0, dRi : unused all var’s in “home position”
• address descriptors: “home position” not explictly needed.• for instance: variable a always to be found “at a ”, as indicatedin line “0”.
• in the table: only changes (from top to bottom) indicated• after line 3:
• t dead• t resides in R0 (and nothing else in R0)→ reuse R0
607 / 672
![Page 608: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/608.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
608 / 672
![Page 609: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/609.jpg)
From “local” to “global” data flow analysis
• data stored in variables, and “flows from definitions to uses”• liveness analysis
• one prototypical (and important) data flow analysis• so far: intra-block = straight-line code
• related to• def-use analysis: given a “definition” of a variable at some
place, where it is (potentially) used• use-def: (the inverse question, “reaching definitions”
• other similar questions:• has a value of an expression been calculated before (“available
expressions”)• will an expression be used in all possible branches (“very busy
expressions”)
609 / 672
![Page 610: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/610.jpg)
Global data flow analysis
• block-local• block-local analysis (here liveness): exact information possible• block-local liveness: 1 backward scan• important use of liveness: register allocation, temporaries
typically don’t survive blocks anyway
• global: working on the complete CFG
2 complications• branching: non-determinism, unclear which branch is taken• loops in the program (loops/cycles in the graph): a simple onepass through the graph does not cut it any longer
• exact answers no longer possible (undecidable)⇒ work with safe approximations• this is: general characteristic of DFA
610 / 672
![Page 611: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/611.jpg)
Generalizing block-local liveness analysis
• assumptions for block-local analysis• all program variables (assumed) live at the end of each basic
block• all temps are assumed dead there.72
• now: we do better, info accross blocks
at the end of each block:which variables may be use used in subsequent block(s).
• now: re-use of temporaries (and thus corresponding registers)across blocks possible
• remember local liveness algo: determined liveness status pervar/temp at the end of each “line/instruction”73
72While assuming variables live, even if they are not, is safe, the oppositemay be unsafe. The code generator therefore must not reuse temporariesacross blocks when doing this assumption.
73For sake of making a parallel one could: consider each line as individualblock.
611 / 672
![Page 612: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/612.jpg)
Connecting blocks in the CFG: inLive and outLive
• CFG:• pretty conventional graph (nodes and edges, often designated
start and end node)74
• nodes = basic blocks = contain straight-line code (here 3AIC)• being conventional graphs:
• conventional representations possible• E.g. nodes with lists/sets/collections of immediate successor
nodes plus immediate predecessor nodes
• remember: local liveness status• can be different before and after one single instruction• liveness status before expressed as dependent on status after⇒ backward scan
• Now per block: inLive and outLive
74For some analyses resp. algos: assumed that the only cycles in the graphare loops. However, the techniques presented here work generally.
612 / 672
![Page 613: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/613.jpg)
inLive and outLive
• tracing / approximating set of live variables75 at the beginningand end per basic block
• inLive of a block: depends on• outLive of that block and• the SLC inside that block
• outLive of a block: depends on inLive of the successor blocks
Approximation: To err on the safe sideJudging a variable (statically) live: always safe. Judging wrongly avariable dead (which actually will be used): unsafe
• goal: smallest (but safe) possible sets for outLive (and inLive)
75to stress “approximation”: inLive and outLive contain sets of statically livevariables. If those are dynamically live or not is undecidable.
613 / 672
![Page 614: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/614.jpg)
Example: Faculty CFG
• inLive and outLive• picture shows arrows assuccessor nodes
• needed predecessor nodes(reverse arrows)
node/block predecessorsB1 ∅B2 {B1}B3 {B2,B3}B4 {B3}B5 {B1,B4}
614 / 672
![Page 615: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/615.jpg)
Block local info for global liveness/data flow analysis
• 1 CFG per procedure/function/method• as for SLG: algo works backwards• for each block: underlying block-local liveness analyis
3-values block local status per variableresult of block-local live variable analysis1. locally live on entry: variable used (before overwritten or not)2. locally dead on entry: variable overwritten (before used or not)3. status not locally determined: variable neither assigned to nor
read locally
• for efficiency: precompute this info, before starting the globaliteration ⇒ avoid recomputation for blocks in loops
• in the smallish examples: we often do it simply on-the-fly (by“looking at” the blocks’ SLC)
615 / 672
![Page 616: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/616.jpg)
Global DFA as iterative “completion algorithm”
• different names for general approach• closure algorithm• fixpoint iteration
• basically: a big loop with• iterating a step approaching an intended solution by making
current approximation of the solution larger• until the solution stabilizes
• similar (for example): calculation of first- and follow-sets• often: realized as worklist algo
• named after central data-structure containing the“work-still-to-be-done”
• here possible: worklist containing nodes untreated wrt.liveness analysis (or DFA in general)
616 / 672
![Page 617: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/617.jpg)
Example
a := 5L1 : x := 8
y := a + xi f_true x=0 goto L4z := a + x // B3a := y + zi f_ f a l s e a=0 goto L1a := a + 1 // B2y := 3 + x
L5 a := x + yr e s u l t := a + zre tu rn r e s u l t // B6
L4 : a := y + 8y := 3goto L5
617 / 672
![Page 618: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/618.jpg)
CFG: initialization
a:=5
x:=8y:=a+x
z:=a+xa:=y+z
x:=y+8
y:=3
a:=a+1y:=3+z
a:=x+yresult:=a+z
return result
B0
B1
B2
B3 B4
B5
B6
∅
∅
∅
∅ ∅
∅
∅
∅
∅
∅
∅ ∅
∅
∅
• inLive and outLive:initialized to ∅everywere
• note: start with(most) unsafeestimation
• extra (return) node• but: analysis herelocal per procedure,only
618 / 672
![Page 619: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/619.jpg)
Iterative algo
General schemaInitialization start with the “minimal” estimation (∅ everywhere)
Loop pick one node & update (= enlarge) livenessestimation in connection with that node
Until finish upon stabilization. no further enlargement
• order of treatment of nodes: in princple arbitrary76
• in tendency: following edges backwards• comparison: for linear graphs (like inside a block):
• no repeat-until-stabilize loop needed• 1 simple backward scan enough
76There may be more efficient and less efficient orders of treatment.619 / 672
![Page 620: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/620.jpg)
Liveness: run
a:=5
x:=8y:=a+x
z:=a+xa:=y+z
x:=y+8
y:=3
a:=a+1y:=3+z
a:=x+yresult:=a+z
return result
B0
B1
B2
B3 B4
B5
B6
∅
∅
∅
∅ ∅
∅
∅
∅
∅
∅
∅ ∅
∅
∅
620 / 672
![Page 621: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/621.jpg)
Liveness: run
a:=5
x:=8y:=a+x
z:=a+xa:=y+z
x:=y+8
y:=3
a:=a+1y:=3+z
a:=x+yresult:=a+z
return result
B0
B1
B2
B3 B4
B5
B6
∅
∅
∅
∅ ∅
∅
{r}
∅
∅
∅
∅ ∅
∅
∅
621 / 672
![Page 622: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/622.jpg)
Liveness: run
a:=5
x:=8y:=a+x
z:=a+xa:=y+z
x:=y+8
y:=3
a:=a+1y:=3+z
a:=x+yresult:=a+z
return result
B0
B1
B2
B3 B4
B5
B6
∅
∅
∅
∅ ∅
∅
{r}
∅
∅
∅
∅ ∅
{r}
∅
622 / 672
![Page 623: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/623.jpg)
Liveness: run
a:=5
x:=8y:=a+x
z:=a+xa:=y+z
x:=y+8
y:=3
a:=a+1y:=3+z
a:=x+yresult:=a+z
return result
B0
B1
B2
B3 B4
B5
B6
∅
∅
∅
∅ ∅
{x , y .z}
{r}
∅
∅
∅
∅ ∅
{r}
∅
623 / 672
![Page 624: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/624.jpg)
Liveness: run
a:=5
x:=8y:=a+x
z:=a+xa:=y+z
x:=y+8
y:=3
a:=a+1y:=3+z
a:=x+yresult:=a+z
return result
B0
B1
B2
B3 B4
B5
B6
∅
∅
∅
∅ ∅
{x , y .z}
{r}
∅
∅
{x , y , z}
∅ {x , y , z}
{r}
∅
624 / 672
![Page 625: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/625.jpg)
Liveness: run
a:=5
x:=8y:=a+x
z:=a+xa:=y+z
x:=y+8
y:=3
a:=a+1y:=3+z
a:=x+yresult:=a+z
return result
B0
B1
B2
B3 B4
B5
B6
∅
∅
∅
∅ {x , y , z}
{x , y .z}
{r}
∅
∅
{x , y , z}
∅ {x , y , z}
{r}
∅
625 / 672
![Page 626: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/626.jpg)
Liveness: run
a:=5
x:=8y:=a+x
z:=a+xa:=y+z
x:=y+8
y:=3
a:=a+1y:=3+z
a:=x+yresult:=a+z
return result
B0
B1
B2
B3 B4
B5
B6
∅
∅
∅
∅ {x , y , z}
{x , y .z}
{r}
∅
{x , y , z}
{x , y , z}
∅ {x , y , z}
{r}
∅
626 / 672
![Page 627: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/627.jpg)
Liveness: run
a:=5
x:=8y:=a+x
z:=a+xa:=y+z
x:=y+8
y:=3
a:=a+1y:=3+z
a:=x+yresult:=a+z
return result
B0
B1
B2
B3 B4
B5
B6
∅
∅
{a, x , z}
∅ {x , y , z}
{x , y .z}
{r}
∅
{x , y , z}
{x , y , z}
∅ {x , y , z}
{r}
∅
627 / 672
![Page 628: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/628.jpg)
Liveness: run
a:=5
x:=8y:=a+x
z:=a+xa:=y+z
x:=y+8
y:=3
a:=a+1y:=3+z
a:=x+yresult:=a+z
return result
B0
B1
B2
B3 B4
B5
B6
∅
∅
{a, x , z}
∅ {x , y , z}
{x , y .z}
{r}
∅
{x , y , z}
{x , y , z}
{a, z, x} {x , y , z}
{r}
∅
628 / 672
![Page 629: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/629.jpg)
Liveness: run
a:=5
x:=8y:=a+x
z:=a+xa:=y+z
x:=y+8
y:=3
a:=a+1y:=3+z
a:=x+yresult:=a+z
return result
B0
B1
B2
B3 B4
B5
B6
∅
∅
{a, x , z}
{a, x , y} {x , y , z}
{x , y .z}
{r}
∅
{x , y , z}
{x , y , z}
{a, z, x} {x , y , z}
{r}
∅
629 / 672
![Page 630: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/630.jpg)
Liveness: run
a:=5
x:=8y:=a+x
z:=a+xa:=y+z
x:=y+8
y:=3
a:=a+1y:=3+z
a:=x+yresult:=a+z
return result
B0
B1
B2
B3 B4
B5
B6
∅
∅
{a, x , z}
{a, x , y} {x , y , z}
{x , y .z}
{r}
∅
{a, x , y , z}
{x , y , z}
{a, z, x} {x , y , z}
{r}
∅
630 / 672
![Page 631: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/631.jpg)
Liveness: run
a:=5
x:=8y:=a+x
z:=a+xa:=y+z
x:=y+8
y:=3
a:=a+1y:=3+z
a:=x+yresult:=a+z
return result
B0
B1
B2
B3 B4
B5
B6
∅
{a, z}
{a, x , z}
{a, x , y} {x , y , z}
{x , y .z}
{r}
∅
{a, x , y , z}
{x , y , z}
{a, z, x} {x , y , z}
{r}
∅
631 / 672
![Page 632: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/632.jpg)
Liveness: run
a:=5
x:=8y:=a+x
z:=a+xa:=y+z
x:=y+8
y:=3
a:=a+1y:=3+z
a:=x+yresult:=a+z
return result
B0
B1
B2
B3 B4
B5
B6
∅
{a, z}
{a, x , z}
{a, x , y} {x , y , z}
{x , y .z}
{r}
{a, z}
{a, x , y , z}
{x , y , z}
{a, z, x} {x , y , z}
{r}
∅
632 / 672
![Page 633: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/633.jpg)
Liveness: run
a:=5
x:=8y:=a+x
z:=a+xa:=y+z
x:=y+8
y:=3
a:=a+1y:=3+z
a:=x+yresult:=a+z
return result
B0
B1
B2
B3 B4
B5
B6
{z}
{a, z}
{a, x , z}
{a, x , y} {x , y , z}
{x , y .z}
{r}
{a, z}
{a, x , y , z}
{x , y , z}
{a, z, x} {x , y , z}
{r}
∅
633 / 672
![Page 634: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/634.jpg)
Liveness example: remarks
• the shown traversal strategy is (cleverly) backwards• example resp. example run simplistic:• the loop (and the choice of “evaluation” order):
“harmless loop”after having updated the outLive info for B1 following the edgefrom B3 to B1 backwards (propagating flow from B1 back to B3)does not increase the current solution for B3
• no need (in this particular order) for continuing the iterativesearch for stabilization
• in other examples: loop iteration cannot be avoided• note also: end result (after stabilization) independent fromevaluation order! (only some strategies may stabilize faster. . . )
634 / 672
![Page 635: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/635.jpg)
Another example
x:=5y:=a-1
x:=y+8y:=a+x
a:=y+z x:=y+xy:=3
y:=3+z
a:=x+yresult:=a+1
return result
B0
B1
B2
B3 B4
B5
B6
∅
∅
∅
∅ ∅
∅
∅
∅
∅
∅
∅ ∅
∅
∅
635 / 672
![Page 636: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/636.jpg)
Another example
x:=5y:=a-1
x:=y+8y:=a+x
a:=y+z x:=y+xy:=3
y:=3+z
a:=x+yresult:=a+1
return result
B0
B1
B2
B3 B4
B5
B6
∅
∅
∅
∅ ∅
∅
{r}
∅
∅
∅
∅ ∅
∅
∅
636 / 672
![Page 637: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/637.jpg)
Another example
x:=5y:=a-1
x:=y+8y:=a+x
a:=y+z x:=y+xy:=3
y:=3+z
a:=x+yresult:=a+1
return result
B0
B1
B2
B3 B4
B5
B6
∅
∅
∅
∅ ∅
∅
{r}
∅
∅
∅
∅ ∅
{r}
∅
637 / 672
![Page 638: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/638.jpg)
Another example
x:=5y:=a-1
x:=y+8y:=a+x
a:=y+z x:=y+xy:=3
y:=3+z
a:=x+yresult:=a+1
return result
B0
B1
B2
B3 B4
B5
B6
∅
∅
∅
∅ ∅
{x , y}
{r}
∅
∅
∅
∅ ∅
{r}
∅
638 / 672
![Page 639: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/639.jpg)
Another example
x:=5y:=a-1
x:=y+8y:=a+x
a:=y+z x:=y+xy:=3
y:=3+z
a:=x+yresult:=a+1
return result
B0
B1
B2
B3 B4
B5
B6
∅
∅
∅
∅ ∅
{x , y}
{r}
∅
∅
{x , y}
∅ {x , y}
{r}
∅
639 / 672
![Page 640: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/640.jpg)
Another example
x:=5y:=a-1
x:=y+8y:=a+x
a:=y+z x:=y+xy:=3
y:=3+z
a:=x+yresult:=a+1
return result
B0
B1
B2
B3 B4
B5
B6
∅
∅
∅
∅ {x , y}
{x , y}
{r}
∅
∅
{x , y}
∅ {x , y}
{r}
∅
640 / 672
![Page 641: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/641.jpg)
Another example
x:=5y:=a-1
x:=y+8y:=a+x
a:=y+z x:=y+xy:=3
y:=3+z
a:=x+yresult:=a+1
return result
B0
B1
B2
B3 B4
B5
B6
∅
∅
∅
∅ {x , y}
{x , y}
{r}
∅
{x , y}
{x , y}
∅ {x , y}
{r}
∅
641 / 672
![Page 642: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/642.jpg)
Another example
x:=5y:=a-1
x:=y+8y:=a+x
a:=y+z x:=y+xy:=3
y:=3+z
a:=x+yresult:=a+1
return result
B0
B1
B2
B3 B4
B5
B6
∅
{y , a}
∅
∅ {x , y}
{x , y}
{r}
∅
{x , y}
{x , y}
∅ {x , y}
{r}
∅
642 / 672
![Page 643: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/643.jpg)
Another example
x:=5y:=a-1
x:=y+8y:=a+x
a:=y+z x:=y+xy:=3
y:=3+z
a:=x+yresult:=a+1
return result
B0
B1
B2
B3 B4
B5
B6
∅
{y , a}
∅
∅ {x , y}
{x , y}
{r}
{y , a}
{x , y}
{x , y , a}
∅ {x , y}
{r}
∅
643 / 672
![Page 644: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/644.jpg)
Another example
x:=5y:=a-1
x:=y+8y:=a+x
a:=y+z x:=y+xy:=3
y:=3+z
a:=x+yresult:=a+1
return result
B0
B1
B2
B3 B4
B5
B6
{a}
{y , a}
∅
∅ {x , y}
{x , y}
{r}
{y , a}
{x , y}
{x , y , a}
∅ {x , y}
{r}
∅
644 / 672
![Page 645: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/645.jpg)
Another example
x:=5y:=a-1
x:=y+8y:=a+x
a:=y+z x:=y+xy:=3
y:=3+z
a:=x+yresult:=a+1
return result
B0
B1
B2
B3 B4
B5
B6
{a}
{y , a}
{x , z, a}
∅ {x , y}
{x , y}
{r}
{y , a}
{a, x , y}
{x , y , a}
∅ {x , y}
{r}
∅
645 / 672
![Page 646: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/646.jpg)
Another example
x:=5y:=a-1
x:=y+8y:=a+x
a:=y+z x:=y+xy:=3
y:=3+z
a:=x+yresult:=a+1
return result
B0
B1
B2
B3 B4
B5
B6
{a}
{y , a}
{x , z, a}
∅ {x , y}
{x , y}
{r}
{y , a}
{a, x , y}
{x , y , a}
{x , z, a} {x , y}
{r}
∅
646 / 672
![Page 647: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/647.jpg)
Another example
x:=5y:=a-1
x:=y+8y:=a+x
a:=y+z x:=y+xy:=3
y:=3+z
a:=x+yresult:=a+1
return result
B0
B1
B2
B3 B4
B5
B6
{a}
{y , a}
{x , z, a}
{x , y , z} {x , y}
{x , y}
{r}
{y , a}
{a, x , y}
{x , y , a}
{x , z, a} {x , y}
{r}
∅
647 / 672
![Page 648: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/648.jpg)
Another example
x:=5y:=a-1
x:=y+8y:=a+x
a:=y+z x:=y+xy:=3
y:=3+z
a:=x+yresult:=a+1
return result
B0
B1
B2
B3 B4
B5
B6
{a}
{y , a}
{x , z, a}
{x , y , z} {x , y}
{x , y}
{r}
{y , a}
{a, x , y , z}
{x , y , a}
{x , z, a} {x , y}
{r}
∅
648 / 672
![Page 649: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/649.jpg)
Another example
x:=5y:=a-1
x:=y+8y:=a+x
a:=y+z x:=y+xy:=3
y:=3+z
a:=x+yresult:=a+1
return result
B0
B1
B2
B3 B4
B5
B6
{a}
{y , z, a}
{x , z, a}
{x , y , z} {x , y}
{x , y}
{r}
{y , a}
{a, x , y , z}
{x , y , a}
{x , z, a} {x , y}
{r}
∅
649 / 672
![Page 650: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/650.jpg)
Another example
x:=5y:=a-1
x:=y+8y:=a+x
a:=y+z x:=y+xy:=3
y:=3+z
a:=x+yresult:=a+1
return result
B0
B1
B2
B3 B4
B5
B6
{a}
{y , z, a}
{x , z, a}
{x , y , z} {x , y}
{x , y}
{r}
{y , z, a}
{a, x , y , z}
{x , y , z, a}
{x , z, a} {x , y}
{r}
∅
650 / 672
![Page 651: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/651.jpg)
Another example
x:=5y:=a-1
x:=y+8y:=a+x
a:=y+z x:=y+xy:=3
y:=3+z
a:=x+yresult:=a+1
return result
B0
B1
B2
B3 B4
B5
B6
{a, z}
{y , z, a}
{x , z, a}
{x , y , z} {x , y}
{x , y}
{r}
{y , z, a}
{a, x , y , z}
{x , y , z, a}
{x , z, a} {x , y}
{r}
∅
651 / 672
![Page 652: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/652.jpg)
Example remarks
• loop: this time leads to updating estimation more than once• evaluation order not chose ideally
652 / 672
![Page 653: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/653.jpg)
Precomputing the block-local “liveness effects”
• precomputation of the relevant info: efficiency• traditionally: represented as kill and generate information• here (for liveness)
1. kill: variable instances, which are overwritten2. generate: variables used in the block (before overwritten)3. rests: all other variables won’t change their status
Constraint per basic block (transfer function)
inLive = outLive/kill(B) ∪ generate(B)
• note:• order of kill and generate in above’s equation77
• a variable killed in a block may be “revived” in a block• simplest (one line) example: x := x +177In principle, one could also arrange the opposite order (interpreting kill and
generatate slightly differently). One can also define the so-called transferfunction directly, without splitting into kill and generate. The phrasing usingsuch tranfer functions: works for other DFA as well.
653 / 672
![Page 654: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/654.jpg)
Example once again
k: {x , y}, g: {a}
k: {x , y}, g: {a, y}
k: {a}, g: {y , z} k: {x , y}, g: {x , y}
k: {y}, g: {z}
k: {r , a}, g: {x , y}
k: {}, g: {r}
B0
B1
B2
B3 B4
B5
B6
∅
∅
∅
∅ ∅
∅
∅
∅
∅
∅
∅ ∅
∅
∅
654 / 672
![Page 655: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/655.jpg)
Example once again
k: {x , y}, g: {a}
k: {x , y}, g: {a, y}
k: {a}, g: {y , z} k: {x , y}, g: {x , y}
k: {y}, g: {z}
k: {r , a}, g: {x , y}
k: {}, g: {r}
B0
B1
B2
B3 B4
B5
B6
∅
∅
∅
∅ ∅
∅
{r}
∅
∅
∅
∅ ∅
∅
∅
655 / 672
![Page 656: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/656.jpg)
Example once again
k: {x , y}, g: {a}
k: {x , y}, g: {a, y}
k: {a}, g: {y , z} k: {x , y}, g: {x , y}
k: {y}, g: {z}
k: {r , a}, g: {x , y}
k: {}, g: {r}
B0
B1
B2
B3 B4
B5
B6
∅
∅
∅
∅ ∅
∅
{r}
∅
∅
∅
∅ ∅
{r}
∅
656 / 672
![Page 657: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/657.jpg)
Example once again
k: {x , y}, g: {a}
k: {x , y}, g: {a, y}
k: {a}, g: {y , z} k: {x , y}, g: {x , y}
k: {y}, g: {z}
k: {r , a}, g: {x , y}
k: {}, g: {r}
B0
B1
B2
B3 B4
B5
B6
∅
∅
∅
∅ ∅
{x , y}
{r}
∅
∅
∅
∅ ∅
{r}
∅
657 / 672
![Page 658: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/658.jpg)
Example once again
k: {x , y}, g: {a}
k: {x , y}, g: {a, y}
k: {a}, g: {y , z} k: {x , y}, g: {x , y}
k: {y}, g: {z}
k: {r , a}, g: {x , y}
k: {}, g: {r}
B0
B1
B2
B3 B4
B5
B6
∅
∅
∅
∅ ∅
{x , y}
{r}
∅
∅
{x , y}
∅ {x , y}
{r}
∅
658 / 672
![Page 659: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/659.jpg)
Example once again
k: {x , y}, g: {a}
k: {x , y}, g: {a, y}
k: {a}, g: {y , z} k: {x , y}, g: {x , y}
k: {y}, g: {z}
k: {r , a}, g: {x , y}
k: {}, g: {r}
B0
B1
B2
B3 B4
B5
B6
∅
∅
∅
∅ {x , y}
{x , y}
{r}
∅
∅
{x , y}
∅ {x , y}
{r}
∅
659 / 672
![Page 660: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/660.jpg)
Example once again
k: {x , y}, g: {a}
k: {x , y}, g: {a, y}
k: {a}, g: {y , z} k: {x , y}, g: {x , y}
k: {y}, g: {z}
k: {r , a}, g: {x , y}
k: {}, g: {r}
B0
B1
B2
B3 B4
B5
B6
∅
∅
∅
∅ {x , y}
{x , y}
{r}
∅
{x , y}
{x , y}
∅ {x , y}
{r}
∅
660 / 672
![Page 661: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/661.jpg)
Example once again
k: {x , y}, g: {a}
k: {x , y}, g: {a, y}
k: {a}, g: {y , z} k: {x , y}, g: {x , y}
k: {y}, g: {z}
k: {r , a}, g: {x , y}
k: {}, g: {r}
B0
B1
B2
B3 B4
B5
B6
∅
{y , a}
∅
∅ {x , y}
{x , y}
{r}
∅
{x , y}
{x , y}
∅ {x , y}
{r}
∅
661 / 672
![Page 662: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/662.jpg)
Example once again
k: {x , y}, g: {a}
k: {x , y}, g: {a, y}
k: {a}, g: {y , z} k: {x , y}, g: {x , y}
k: {y}, g: {z}
k: {r , a}, g: {x , y}
k: {}, g: {r}
B0
B1
B2
B3 B4
B5
B6
∅
{y , a}
∅
∅ {x , y}
{x , y}
{r}
{y , a}
{x , y}
{x , y , a}
∅ {x , y}
{r}
∅
662 / 672
![Page 663: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/663.jpg)
Example once again
k: {x , y}, g: {a}
k: {x , y}, g: {a, y}
k: {a}, g: {y , z} k: {x , y}, g: {x , y}
k: {y}, g: {z}
k: {r , a}, g: {x , y}
k: {}, g: {r}
B0
B1
B2
B3 B4
B5
B6
{a}
{y , a}
∅
∅ {x , y}
{x , y}
{r}
{y , a}
{x , y}
{x , y , a}
∅ {x , y}
{r}
∅
663 / 672
![Page 664: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/664.jpg)
Example once again
k: {x , y}, g: {a}
k: {x , y}, g: {a, y}
k: {a}, g: {y , z} k: {x , y}, g: {x , y}
k: {y}, g: {z}
k: {r , a}, g: {x , y}
k: {}, g: {r}
B0
B1
B2
B3 B4
B5
B6
{a}
{y , a}
{x , z, a}
∅ {x , y}
{x , y}
{r}
{y , a}
{a, x , y}
{x , y , a}
∅ {x , y}
{r}
∅
664 / 672
![Page 665: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/665.jpg)
Example once again
k: {x , y}, g: {a}
k: {x , y}, g: {a, y}
k: {a}, g: {y , z} k: {x , y}, g: {x , y}
k: {y}, g: {z}
k: {r , a}, g: {x , y}
k: {}, g: {r}
B0
B1
B2
B3 B4
B5
B6
{a}
{y , a}
{x , z, a}
∅ {x , y}
{x , y}
{r}
{y , a}
{a, x , y}
{x , y , a}
{x , z, a} {x , y}
{r}
∅
665 / 672
![Page 666: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/666.jpg)
Example once again
k: {x , y}, g: {a}
k: {x , y}, g: {a, y}
k: {a}, g: {y , z} k: {x , y}, g: {x , y}
k: {y}, g: {z}
k: {r , a}, g: {x , y}
k: {}, g: {r}
B0
B1
B2
B3 B4
B5
B6
{a}
{y , a}
{x , z, a}
{x , y , z} {x , y}
{x , y}
{r}
{y , a}
{a, x , y}
{x , y , a}
{x , z, a} {x , y}
{r}
∅
666 / 672
![Page 667: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/667.jpg)
Example once again
k: {x , y}, g: {a}
k: {x , y}, g: {a, y}
k: {a}, g: {y , z} k: {x , y}, g: {x , y}
k: {y}, g: {z}
k: {r , a}, g: {x , y}
k: {}, g: {r}
B0
B1
B2
B3 B4
B5
B6
{a}
{y , a}
{x , z, a}
{x , y , z} {x , y}
{x , y}
{r}
{y , a}
{a, x , y , z}
{x , y , a}
{x , z, a} {x , y}
{r}
∅
667 / 672
![Page 668: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/668.jpg)
Example once again
k: {x , y}, g: {a}
k: {x , y}, g: {a, y}
k: {a}, g: {y , z} k: {x , y}, g: {x , y}
k: {y}, g: {z}
k: {r , a}, g: {x , y}
k: {}, g: {r}
B0
B1
B2
B3 B4
B5
B6
{a}
{y , z, a}
{x , z, a}
{x , y , z} {x , y}
{x , y}
{r}
{y , a}
{a, x , y , z}
{x , y , a}
{x , z, a} {x , y}
{r}
∅
668 / 672
![Page 669: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/669.jpg)
Example once again
k: {x , y}, g: {a}
k: {x , y}, g: {a, y}
k: {a}, g: {y , z} k: {x , y}, g: {x , y}
k: {y}, g: {z}
k: {r , a}, g: {x , y}
k: {}, g: {r}
B0
B1
B2
B3 B4
B5
B6
{a}
{y , z, a}
{x , z, a}
{x , y , z} {x , y}
{x , y}
{r}
{y , z, a}
{a, x , y , z}
{x , y , z, a}
{x , z, a} {x , y}
{r}
∅
669 / 672
![Page 670: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/670.jpg)
Example once again
k: {x , y}, g: {a}
k: {x , y}, g: {a, y}
k: {a}, g: {y , z} k: {x , y}, g: {x , y}
k: {y}, g: {z}
k: {r , a}, g: {x , y}
k: {}, g: {r}
B0
B1
B2
B3 B4
B5
B6
{a, z}
{y , z, a}
{x , z, a}
{x , y , z} {x , y}
{x , y}
{r}
{y , z, a}
{a, x , y , z}
{x , y , z, a}
{x , z, a} {x , y}
{r}
∅
670 / 672
![Page 671: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/671.jpg)
Outline
1. Introduction
2. Scanning
3. Grammars
4. Parsing
5. Semantic analysis
6. Symbol tables
7. Types and type checking
8. Run-time environments
9. Intermediate code generation
10. Code generation
671 / 672
![Page 672: INF5110 Compiler Construction...Introduction 2. Scanning 3. Grammars 4. Parsing 5. Semanticanalysis 6. Symboltables 7. Typesandtypechecking 8. Run-timeenvironments 9. Intermediatecodegeneration](https://reader035.vdocuments.us/reader035/viewer/2022070107/602541eb1d08a85ae10e73f7/html5/thumbnails/672.jpg)
References I
[Aho et al., 2007] Aho, A. V., Lam, M. S., Sethi, R., and Ullman, J. D. (2007).Compilers: Principles, Techniques and Tools.Pearson,Addison-Wesley, second edition.
[Aho et al., 1986] Aho, A. V., Sethi, R., and Ullman, J. D. (1986).Compilers: Principles, Techniques and Tools.Addison-Wesley.
[Chomsky, 1956] Chomsky, N. (1956).: Three models for the description of language.IRE Transactions on Information Theory, 2(113–124).
[Hopcroft, 1971] Hopcroft, J. E. (1971).An n log n algorithm for minimizing the states in a finite automaton.In Kohavi, Z., editor, The Theory of Machines and Computations, pages 189–196. Academic Press,New York.
[Kleene, 1956] Kleene, S. C. (1956).Representation of events in nerve nets and finite automata.In Automata Studies, pages 3–42. Princeton University Press.
[Louden, 1997] Louden, K. (1997).Compiler Construction, Principles and Practice.PWS Publishing.
[Rabin and Scott, 1959] Rabin, M. and Scott, D. (1959).Finite automata and their decision problems.IBM Journal of Research Developments, 3:114–125.
[Thompson, 1968] Thompson, K. (1968).Programming techniques: Regular expression search algorithm.Communications of the ACM, 11(6):419.
672 / 672