intermediate code generation manas thakur

38
CS502: Compiler Design Intermediate Code Generation Manas Thakur Fall 2020

Upload: others

Post on 15-May-2022

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Intermediate Code Generation Manas Thakur

CS502: Compiler Design

Intermediate Code Generation

Manas Thakur

Fall 2020

Page 2: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 2

Midway through the course!

Lexical AnalyzerLexical Analyzer

Syntax AnalyzerSyntax Analyzer

Semantic AnalyzerSemantic Analyzer

Intermediate Code Generator

Intermediate Code Generator

Character stream

Token stream

Syntax tree

Syntax tree

Intermediaterepresentation

Machine-Independent Code Optimizer

Machine-Independent Code Optimizer

Code GeneratorCode Generator

Target machine code

Intermediate representation

Machine-Dependent Code Optimizer

Machine-Dependent Code Optimizer

Target machine code

SymbolTable

F r

o n

t e

n d

B a

c k

e n

d

Page 3: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 3

Roles of IR Generator

● Act as a glue between front-end and back-end

– Or source and machine codes

● Lower abstraction from source level

– To make life simple

● Maintain some high-level information

– To keep life interesting

● Make the dream of m+n components for m languages and n platforms look like a possibility

– Scala to Java Bytecode, for example

● Enable machine-independent optimization

– Next phase

Page 4: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 4

Intermediate Representations (IR)

● IR design affects compiler speed and capabilities

● Some important IR properties:

– Ease of generation, manipulation, optimization

– Size of the representation

– Level of abstraction: level of detail in the IR● How close is the IR to source code? To the machine?● What kinds of operations are represented?

● Often, different IRs for different jobs:

– High-level IR: close to the source language

– Low-level IR: close to the assembly code

– Some compilers even have mid-level IRs!

Page 5: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 5

Kinds of IRs

● Structural

– Graph oriented

– Heavily used in IDEs, source-to-sourcetranslators

– Tend to be large

● Linear

– Pseudo-code for an abstract machine

– Level of abstraction varies

– Simple, compact data structures

● Hybrid

– Combination of graphs and linear code

Examples:ASTs, DAGs

Examples:3 address codeBytecode (Stack machine)

Examples:Control-flow graphs,Ideal IR (HotSpot C2)

Page 6: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 6

Abstract Syntax Tree (AST)

● Parse tree with some intermediate nodes removed

● Advantages:

– Easy to evaluate● Postfix form: x 2 y * -● Useful for interpretation

– Source code can be reconstructed● Helpful in program understanding

x – 2 * y

-

x

2 y

*

Page 7: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 7

Directed Acyclic Graph (DAG)

● AST with a unique node for each value

● Advantages:

– Compact (reduces redundancy)

– Won’t have to evaluate the same expression twice

a + a * (b – c) + (b – c) * d

++

**++

aa ** -- dd

bb ccaa --

bb cc

++

**++

** dd

aa --

bb cc

becomes

Page 8: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 8

Three Address Code (3AC or TAC)● At most

– Three addresses (names/constants) in the instruction

– One operator on the right hand side of assignment

● General statement form: x = y op z

● Longer expressions are simplified by introducing temporaries

● Advantages:

– Easy to understand

– Names for intermediate values

z = x – 2 * y becomest1 = 2 * yt2 = x – t1z = t2

ort1 = 2 * yz = x – t1

Page 9: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 9

More about 3AC● Allows variety of instructions:

– Assignments● x = y op z● x = op y● x = y● x = y[i] and x[i] = y● x = y.f and x.f = y

– Branches● goto L● if x goto L

– Procedure calls

● param x1; param x2; ..., param xn; call p, n

– Pointer assignments

Page 10: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 10

Classwork: Generate 3AC

● r = a + a * (b – c) + (b – c) * d

● if (x < y) S1 else S2

● while (x < 10) S1

t1 = b - ct2 = t1 * dt3 = b – ct4 = a * t3t5 = t4 + t2r = a + t5t1 = x < y

if !t1 goto L1S1goto L2L1: S2L2:

L1: c = x < 10t = !cif !t goto L2S1goto L1L2:

Page 11: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 11

3AC Representations

● Triples

● Quadruples

Assignment: a = b * -c + d * -e

minus c t1

* b t1 t2

minus e t3

* d t3 t4

+ t2 t4 t5

= t5 a

t1 = minus c

t2 = b * t1

t3 = minus e

t4 = d * t3

t5 = t2 + t4

a = t5

op arg1 arg2 result

minus c

* b (0)

minus e

* d (2)

+ (1) (3)

= a (4)

op arg1 arg2

0

1

2

3

4

5

Instructions can be reordered easily.

Instructions cannot be reordered easily.

Page 12: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 12

3AC Representations (Cont.)

● Triples

● Quadruples

Assignment: a = b * -c + d * -e

minus c

* b (0)

minus e

* d (2)

+ (1) (3)

= a (4)

op arg1 arg2

0

1

2

3

4

5

Instructions cannot be reordered easily.

0

1

2

3

4

5

(2)

(3)

(0)

(1)

(4)

(5)

can be reordered easily

(0)

(1)

(2)

(3)

(4)

(5)

Indirect triples

t1 = minus c

t2 = b * t1

t3 = minus e

t4 = d * t3

t5 = t2 + t4

a = t5

Page 13: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 13

2 Address Code

● Where have you seen them?

– Common in Assembly

● Example:

● Larger number of instructions compared to 3AC

● Good for register allocation

z = x – 2 * y

MOV R1, yMUL R1, 2MOV R2, xSUB R2, R1MOV x , R2

becomes

Page 14: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 14

1 Address Code

● Stack-based computers

● Example: Java Virtual Machines!

● Advantages:

– Simple to generate and execute

– Compact form● There is a reason you find Java based systems popular in:

– Embedded systems– Mobile phones (Android)– Systems where code is transmitted (Internet)

x – 2 * y becomes

push xpush 2push ymultiplysubtract

Page 15: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 15

What next?

● More IRs (while learning CGO):

– Control-Flow Graph (CFG)

– Static Single Assignment (SSA)

● Next class: IR generation

– Focus: 3AC. Why?● Comfortable and still affordable!● Offers a wide understanding of

the involved challenges.● Assignment 3 would involve

3AC generation!– But there is time for it.

Page 16: Intermediate Code Generation Manas Thakur

CS502: Compiler Design

Intermediate Code Generation (Cont.)

Manas Thakur

Fall 2020

Page 17: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 17

IR Generation

● High level language is complex

● Goal: Lower HLL code to a simpler form (3AC)

● Constructs that we need to translate:

– Variable declarations

– Expressions

– Array accesses

– Control structures (conditionals, loops)

– Function calls

– Function bodies

– Classes and objects!

● Approach: Syntax-directed translation from parse tree.

Page 18: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 18

Variable declarations

● Use symbol tables

– Maps from names to values

● Take care of nested scopes

– What will you do at the entry to a new block?

– What to do at a function call?

– Function entry?

– Function exit?

– Need to push and pop the current environment.

● Fields of a structure/class?

– We will study in detail when we learn translating objects.

Page 19: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 19

Lowering scheme

● Code template for each AST node

– Captures key semantics of each construct

– Has blanks for the node’s children

– Implemented in a function called gen● To fill in the template:

– Call the function gen recursively on children● Did anyone say “visitors”?

– Plug code into the blanks

● How to stitch code together?

– gen stores the results into a temporary

– Emit code that combines the results for the syntactic construct represented by the current node

Page 20: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 20

Translating expressionsSay E.addr is a synthesized attribute that denotes the temporary holding the value of E.

Construct Translation

E -> E1 + E2

E.addr = newtemp();gen(E.addr ‘=’ E1.addr ‘+’ E2.addr)

Construct visit() method

E -> E1 + E2

t1 = visit(E1);t2 = visit(E2);r = newtemp();System.out.println(“r = t1 + t2”);return r;

Construct Translation

E -> E1 + E2

E.addr = newtemp();E.code = E1.code || E2.code || gen(E.addr ‘=’ E1.addr ‘+’ E2.addr)

In t

erm

s o

f o

ur

ass

ign

me

nt:

In t

erm

s o

f a n

attr

ibu

te E.c ode

:

Page 21: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 21

Translating expressions (Cont.)

● symTab is the symbol table of the current scope.

Construct Translation

S -> id = E gen(symTab.get(id.lexeme) ‘=’ E.addr)

E -> -E1

E.addr = newtemp()gen(E.addr ‘=’ ‘-’E1.addr)

E -> (E1) E.addr = E1.addr

E -> id E.addr = symTab.get(id.lexeme)

Page 22: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 22

Example

● 3AC for a = b + -c:

Construct Translation

S -> id = E gen(symTab.get(id.lexeme) ‘=’ E.addr)

E -> E1 + E2

E.addr = newtemp();gen(E.addr ‘=’ E1.addr ‘+’ E2.addr)

E -> -E1

E.addr = newtemp()gen(E.addr ‘=’ ‘-’E1.addr)

E -> (E1) E.addr = E1.addr

E -> id E.addr = symTab.get(id.lexeme)

t1 = - ct2 = b + t1a = t2

Page 23: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 23

Translating array references

● Each type has a width (e.g., int may have 4)

● How do you get the relative address (from base) of the ith element of an array A, that is, A[i]?

– base + i * w

● What about A[i][j]?

– base + i1 * w1 + i2 * w2

● In general for a k-dimension array:

– base + i1 * w1 + i2 * w2 + ... + ik * wk

● Note: We are assuming row-major order.

Page 24: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 24

Translating array references (Cont.)

● Say we have the following grammar rule for generating a possibly multidimensional array reference:

● Say we have the following attributes:

– L.addr: a temporary that holds the offset for the array reference

– L.array: pointer to the symTab entry for the array name

– L.array.base: actual location of the array reference

– L.type: type of the subarray generated by L

– t.width: width of type t

– t.elem: type of the elements of array type t

L -> L[E] | id [E]

Page 25: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 25

Translating array references (Cont.)

S -> L = E

E -> L

L -> id[E]

L -> L1[E]

gen(L.array.base’[‘L.addr’]’ ’=’ E.addr)

E.addr = newtemp()gen(E.addr ‘=’ L.array.base’[’L.addr’]’)

L.array = symTab.get(id.lexeme)L.type = L.array.type.elemL.addr = newtemp()gen(L.addr ‘=’ E.addr ‘*’ L.type.width)

L.array = L1.arrayL.type = L1.type.elemt = newtemp()L.addr = newtemp()gen(t ‘=’ E.addr ‘*’ L.type.width)gen(L.addr ‘=’ L1.addr ‘+’ t)

Page 26: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 26

Example● 3AC for c + a[i][j],

– where type of a is array(2, array(3, integer)),

– and width of integer is 4.

t1 = i * 12t2 = j * 4t3 = t1 + t2t4 = a[t3]t5 = c + t4

Construct Translation

S -> L = E gen(L.array.base’[‘L.addr’]’ ’=’ E.addr)

E -> LE.addr = newtemp()gen(E.addr ‘=’ L.array.base’[’L.addr’]’)

L -> id[E]L.array = symTab.get(id.lexeme)L.type = L.array.type.elem; L.addr = newtemp()gen(L.addr ‘=’ E.addr ‘*’ L.type.width)

L -> L1[E]

L.array = L1.array; L.type = L1.type.elem

t = newtemp(); L.addr = newtemp()gen(t ‘=’ E.addr ‘*’ L.type.width)gen(L.addr ‘=’ L1.addr ‘+’ t)

Section 6.4 (DB)

Page 27: Intermediate Code Generation Manas Thakur

CS502: Compiler Design

Intermediate Code Generation (Cont.)

Manas Thakur

Fall 2020

Page 28: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 28

Control flow

● By default straight line

– One statement after another

● Conditionals

– if, if-else, switch

● Loops

– while, do-while, for, repeat-until

● But we first need to consider:

– Boolean expressions

– Jumps (gotos) and labels

Page 29: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 29

Translating boolean expressions

● B -> B || B | B && B | !B | E relop E | true | false

● relop -> < | <= | > | >= | == | !=

● How to optimize the evaluation of || and &&?

– Short-circuiting

– We need to keep that in mind in order to generate efficient 3AC.

– When can not doing short-circuiting affect correctness?

Page 30: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 30

Translating boolean expressions (Cont.)

B -> B1 || B2

B1.true = B.trueB1.false = newlabel()B2.true = B.trueB2.false = B.falseB.code = B1.code || label(B1.false) || B2.code

B -> B1 && B2

B1.true = newlabel()B1.false = B.falseB2.true = B.trueB2.false = B.falseB.code = B1.code || label(B1.true) || B2.code

Say apart from the synthesized attributed code,each Boolean expressions has two inherited attributes true and false.

We will see how B.true and B.false are set after two slides.

Page 31: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 31

Translating boolean expressions (Cont.)

B1.true = B.falseB1.false = B.trueB.code = B1.code

B.code = E1.code || E2.code || gen(‘if’ E1.addr relop E2.addr ‘goto’ B.true) || gen(‘goto’ B.false)

B.code = gen(‘goto’ B.true)

B.code = gen(‘goto’ B.false)

B -> !B1

B -> E1 relop E2

B -> true

B -> false

Page 32: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 32

Translating control-flow expressions

S -> if (B) S1

S -> if (B) S1 else S2

B.true = newlabel()B.false = S1.next = S.nextS.code = B.code || label(B.true) || S1.code

B.true = newlabel()B.false = newlabel()S1.next = S2.next = S.nextS.code = B.code || label(B.true) || S1.code || gen(‘goto’ S.next) || label(B.false) || S2.code

Notice that next is another inherited attribute with each statement.

Page 33: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 33

Translating control-flow expressions (Cont.)

S -> while (B) S1

S -> S1 S2

begin = newlabel()B.true = newlabel()B.false = S.nextS1.next = begin

S.code = label(begin) || B.code || label(B.true) || S1.code || gen(‘goto’ begin)

S1.next = newlabel()

S2.next = S.nextS.code = S1.code || label(S1.next) || S2.code

Page 34: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 34

3AC for if (x < 100 || x > 200 && x != y) x = 0 if x < 100 goto L2

goto L3

L3: if x > 200 goto L4

goto L1

L4: if x != y goto L2

goto L1

L2: x = 0L1:

B1.true = B.trueB1.false = newlabel()B2.true = B.true; B2.false = B.falseB.code = B1.code || label(B1.false) || B2.code

B1.true = newlabel()B1.false = B.false; B2.true = B.trueB2.false = B.falseB.code = B1.code || label(B1.true) || B2.code

B.true = newlabel()B.false = S1.next = S.nextS.code = B.code || label(B.true) || S1.code

B -> B1 || B2

B.code = E1.code || E2.code || gen(‘if’ E1.addr relop E2.addr ‘goto’ B.true) || gen(‘goto’ B.false)

if (B) S1

B -> E1 relop E2

B -> B1 && B2

S1.next = newlabel()

S2.next = S.nextS.code = S1.code || label(S1.next) || S2.code

S -> S1 S2

if x < 100 goto L2

if x <= 200 goto L1

if x == y goto L1

L2: x = 0L1: another way

straightforward

Page 35: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 35

Backpatching

● S -> if (B) S1 required us to pass label for evaluating B.

– We did it using inherited attributes.

● Alternatively, we could leave the label unspecified,

– and fill it in later.

● Called backpatching.

– A general concept for one-pass code generation.

– Self reading: Section 6.7 of Dragon book.

Page 36: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 36

Translating break and continue

● break and continue are special (disciplined?) gotos.

● Their IR needs

– currently enclosing loop/switch.

– goto to a label just outside/before the enclosing block.

● How to generate the 3AC for break?

– either pass on the enclosing block and label as inherited attributes,

or

– use backpatching to fill-in the label of goto.

● For continue?

Page 37: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 37

Translating switch statements

● Using nested if-else

● Using a table of pairs <Vi, S

i>

● Using a hash-table

– when n is large (say >10)

● Special case when Vis are

consecutive integrals

– Indexed array would be sufficient

switch (E) { case V1: S1

case V2: S2

... case Vn-1: Sn-1

default: Sn

}

Page 38: Intermediate Code Generation Manas Thakur

Manas Thakur CS502: Compiler Design 38

Where are we?

● Learnt to generate 3AC for:

– Expressions

– Array references

– Control-flow statements

● Key learning:

– Generate code for yourself; trust the family to patch-up

● Next class (when?):

– Translating classes/structures, objects, object references.