intermediate code generation 2

20
Page 1 Compiler Design, WS 2005/2006 98 Semantic Analysis Compiler Design, WS 2005/2006 99 Semantic Analysis Goal: check correctness of program and enable proper execution. • gather semantic information, e.g. symbol table. • check semantic rules (type checking). Often used approach: • basis: context-free grammar • associate information to language constructs by attaching attributes (properties) to grammar symbols. • specify semantic rules (attribute equations) for grammar productions to compute the values of the attributes. Attribute grammar: context-free grammar with attributes and semantic rules.

Upload: aditya-harbola

Post on 27-Oct-2014

64 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Intermediate Code Generation 2

Page 1

Compiler Design, WS 2005/2006

98

Semantic Analysis

Compiler Design, WS 2005/2006

99Semantic Analysis

– Goal: check correctness of program and enable proper execution.

• gather semantic information, e.g. symbol table.

• check semantic rules (type checking).

– Often used approach:

• basis: context-free grammar

• associate information to language constructs by attaching attributes (properties) to grammar symbols.

• specify semantic rules (attribute equations) for grammar productions to compute the values of the attributes.

• Attribute grammar: context-free grammar with attributes and semantic rules.

Page 2: Intermediate Code Generation 2

Page 2

Compiler Design, WS 2005/2006

100Attribute Grammar (Attributierte Grammatik)

– Synthesized attributes (synthetisierte Attribute):An attribute at a node is synthesized, if its value is computed from the attribute values of the children of that node in the parse tree.

– Inherited attributes (ererbte Attribute):An attribute at a node is inherited if its value is computed from attribute values of the parent and/or siblings of that node in the parse tree.

n

synthesized at node n

n

inherited at node n

Compiler Design, WS 2005/2006

101Example: Synthesized Attributes, Calculator

– integer-valued synthesized attribute “val“ with each nonterminal.

– integer-valued synthesized attribute “lexval“ supplied by lexical analyzer.

F.val := number.lexvalF → number

F.val := E.valF → ( E )

T.val := F.valT → F

T.val := T1.val ∗∗∗∗ F.valT → T1 ∗∗∗∗ F

E.val := T.valE → T

E.val := E1.val + T.valE → E1 + T

print(E.val)L → E

Semantic RuleProduction

Page 3: Intermediate Code Generation 2

Page 3

Compiler Design, WS 2005/2006

102Example: Inherited Attributes, Declaration

– Declaration consists of type T followed by a list of variables.

– Type of T is synthesized and inherited to L:

• “type“ synthesized attribute.

• “in“ inherited attribute.

– Procedure “symtab“ adds type of each identifier to its entry in the symbol table pointed to by attribute entry.

symtab(id.entry, L.in)L → id

L1.in := L.in

symtab(id.entry, L.in)L → L1 , id

T.type := realT → real

T.type := integerT → int

L.in := T.typeD → T L

Semantic RuleProduction

Compiler Design, WS 2005/2006

103Attribute Grammar

– Consider the production X0 → X1 X2 … Xn and the attributesa1,...,a k with the values of the attributes denoted byXi .a j together with the following attribute equation:

– Attribute computation:

• Attribute evaluator:

»Attribute evaluator derived from attribute equations.

»Attribute evaluation has to follow an evaluation order.

» Evaluation order based on dependency graph – multiple passes of parse tree may be necessary.

• Attribute computation during parsing:

» Restricted attributed grammars are required.

).,,.,,.,,.(. 1010 knnkijji aXaXaXaXfaX KKK=

Page 4: Intermediate Code Generation 2

Page 4

Compiler Design, WS 2005/2006

104Attribute Computation During Parsing

– LL/LR-parsing: processes input from left to right

• Consequence: Attribute evaluation has to correspond to left-to-right traversal of parse tree.

– Synthesized attributes:

• Children of a node can be processed in arbitrary order(in particular also from left to right).

– Inherited attributes:

• Backward dependencies are not allowed,i.e. dependencies from right to left in the parse tree.

Compiler Design, WS 2005/2006

105Restricted Attribute Grammars

– S-attributed Grammar: (S ... for synthesized)Attribute grammar in which all attributes are synthesized, i.e. let A → X1 X2 … Xn :

– L-attributed Grammar: (L ... for left to right)An attribute grammar is called L-attributed, if for eachproduction X0 → X1 X2 … Xn and for each inheritedattribute aj the semantic rules are all of the form:

– Every S-attributed grammar is L-attributed.

– Synthesized attributes during LR parsing: simple (seeexample).

– Inherited attributes during LR parsing: can cause problems.

).,,.,,.,,.(. 1111 knnk aXaXaXaXfaA KKK=

).,,.,,.,,.(. 111010 kiikijji aXaXaXaXfaX −−= KKK

Page 5: Intermediate Code Generation 2

Page 5

Compiler Design, WS 2005/2006

106Inherited Attributes and LR Parsing

– LR parsers put off decision on which production to use in a derivation until the RHS of the production is fully formed.

– This makes it difficult for inherited attributes to be made available.

– YACC example:A: B { ...some action...} C;

is interpreted as:A: B U C;U: { ...some action...} ;

– Note: semantic actions can add new parsing conflicts!

Compiler Design, WS 2005/2006

107Symbol Table

– Central repository for distinct kinds of information.Alternative: information directly in intermediate representa-tion (eg. as attributes).

– Variety of names:

• variables, defined constants, type definitions, procedures, compiler-generated variables, etc.

– Typical symbol table entry:

• identifier

• type information

• scope information

• memory location

– Principal symbol table operations: insert, lookup.

– Often realized as hash table.

Page 6: Intermediate Code Generation 2

Page 6

Compiler Design, WS 2005/2006

108Hash-Funktionen (1)

– Die Hash-Funktion muß Schlüsseln gleichmäßig verteilen.

– Die Hash-Funktion soll effizient sein.

– Es ist sehr zu empfehlen für PMAX eine Primzahl zu wählen.

– Betrachten wir nun Hash-Funktionen für Strings der Länge k mit Characters ci, 1 ≤ i ≤ k, und h sei der berechnete Wert. Mit “h mod PMAX” erhält man Index.

• x α:(siehe x65599, x16, x5, x2, x1 in Tabelle).

• quad: 4 aufeinanderfolgende Characters bilden einen Integer, die dann aufaddiert werden und h ergeben.

• middle: h ergibt sich aus mittleren 4 Characters eines Strings.

• ends: Addiert ersten 3 und letzten 3 Characters eines Strings für h.

. und ;0mit 1für , 01 kiii hhhkichh ==≤≤+×= −α

Compiler Design, WS 2005/2006

109Hash-Funktionen (2)

– Funktion hashpjw in C(P.J. Weinberger´s C Compiler, siehe Drachenbuch):

#define PRIME 211#define EOS ’\0’int hashpjw(s)char *s;{

char *p;unsigned h=0, g;for (p=s; *p != EOS; p++) {

h = (h<<4) + (*p);if (g = h & 0xf0000000) {

h = h ^ (g >> 24);h = h ^ g;

}}return h % PRIME;

}

Page 7: Intermediate Code Generation 2

Page 7

Compiler Design, WS 2005/2006

110Experiment mit Hash-Funktionen (1)

(Experiment aus A.V. Aho, R. Sethi, J.D. Ullman: Compilers)

Folgende Testreihe wird verwendet:

1. 50 häufigsten Namen und Keywords von Auswahl von C Programmen.

2. Wie 1., nur mit den 100 häufigsten Namen.

3. Wie 1., nur mit den 500 häufigsten Namen.

4. 952 externe Namen im UNIX Betriebssystem.

5. 627 Namen eines C Programms generiert aus C++.

6. 915 Zufallsnamen.

7. 614 Worte aus Kapitel 3.1 des “Drachen”-Buchs (Compilers).

8. 1201 englische Worte mit “xxx” als Prefix und Suffix

9. Die 300 Worte v100, v101, …, v399.

Compiler Design, WS 2005/2006

111Experiment mit Hash-Funktionen (2)

– Bei Kollisionen wird eine äußere Verkettung mittels Listen verwendet.

– Es wird eine Tabelle der Größe 211 verwendet.

– Die Grundlage für den Vergleich sind die Längen der Listen. Es wird für jede Hash-Funktion die Länge der Listen festgestellt und daraus ein Verteilungsmaß berechnet. Dieses Verteilungsmaß wird auf 1 normiert, indem man esdurch eine theoretisch berechnetes gleichmäßigesVerteilungsmaß dividiert.

– D.h. Werte um 1 stellen eine gleichmäßige Verteilung dar.

– Die Nummern im Diagramm sind die einzelnen Testfälle ausder Testreihe; die besten Hash-Funktionen kommen zuerst.(Wieso x65599: Primzahl nahe 216, die für 32-bit Integer schnell Overflow liefert.)

Page 8: Intermediate Code Generation 2

Page 8

Compiler Design, WS 2005/2006

112Experiment mit Hash-Funktionen (3)

Compiler Design, WS 2005/2006

113Symbol Table, Handling Nested Scopes (1)

Approach: Separate symbol table for each scope.

static int w; // L0int x;

void f(int a, int b) {int c; // L1{ int b, z; // L2a

...}{ int a, x; // L2b

...{ int c, x; // L3

b = a + b + c + w;}

}}

a

x

z

b

a

c

b

w

f

x

c

x

tblptr

L0

L1

L2a

L2b

L3

Page 9: Intermediate Code Generation 2

Page 9

Compiler Design, WS 2005/2006

114Symbol Table, Handling Nested Scopes (2)

Operations:

1. mktable(previous): creates new table

2. enter(table, name, type, offset): creates new entry for name

3. addwidth(table, width): stores width of all entries of table in header of table

4. enterproc(table, name, corrtable): new entry for procedure name.Stacks for symbol tables tblptr and offsets offset.

Translation Scheme: ASU, Chapter 8

D →→→→ proc id; N D ; S { t := top(tblptr);addwidth(t, top(offset));pop(tblptr); pop(offset);enterproc(top(tblptr), id.name, t); }

D →→→→ id : T { enter(top(tblptr), id.name, T.type, top(offset));top(offset) := top(offset) + T.width; }

N →→→→ εεεε { t := mktable(top(tblptr));push(t, tblptr); push(0, offset); }

Compiler Design, WS 2005/2006

115Type Checking

– Types:

• simple types

• structured types: array, struct, union, pointer

– Type equivalence: (Are two type expressions equivalent?)

• structural/declaration/name-equivalence

– Type checking:

• Expressions: e.g. usage of operators

• Statements: e.g. Boolean type of conditional expressions

– Additional topics in type checking:

• type conversion

• overloading

• etc.

Page 10: Intermediate Code Generation 2

Page 10

Compiler Design, WS 2005/2006

116Structural/Declaration/Name-Equivalence

Structural equivalence: Two types are equivalent, if they have the same structure (i.e. consist of the same components).

Name equivalence: Two types are equivalent, if they have either the same simple type or they have the same type name.

Declaration equivalence: Type aliases are supported (weaker version of name equivalence). Two types are equivalent, if they “lead back to“ the same type name.

struct A {int a; float b;} a;

struct B {int c; float d;} b;

typedef struct A C;

C c,c1;

struct.equiv. decl.equiv. name equiv.

a = b; ok error error

a = c; ok ok error

c = c1; ok ok ok

Compiler Design, WS 2005/2006

117

Intermediate Representations

Page 11: Intermediate Code Generation 2

Page 11

Compiler Design, WS 2005/2006

118Intermediate Representation (Zwischendarstellung)

– Intermediate representation (IR): compile-time data structure that represents source program duringtranslation.

– IR design rather art than science.

• Compiler may need several different IRs.

• Best choice depends on tasks which shall be fulfilled.

– No widespread agreement on this subject.

Compiler Design, WS 2005/2006

119Taxonomy: Axis 1

Organizational structure:

1. Structural representations

– trees, e.g. parse tree, abstract syntax tree

– graphs, e.g. control flow graph

2. Linear representations

– pseudo-code for some abstract machine,e.g. three-address code

3. Hybrid representations

– combination of graphs and linear code,e.g. control flow graph

Page 12: Intermediate Code Generation 2

Page 12

Compiler Design, WS 2005/2006

120Taxonomy, Axis 2

Level of abstraction:

1. High-level IR

– close to source program

– appropriate for high-level optimizations (loop transformations) and source-to-source translators

– e.g. control flow graph

2. Medium-level IR

– represents source variables, temporaries

– reduces control flow to un-/conditional branches

– machine independent, powerful instruction set

3. Low-level IR

– almost target machine instructions

Compiler Design, WS 2005/2006

121Examples of Intermediate Representations

– parse tree (rf. to syntax analysis)

– abstract syntax tree (AST) - (see next pages)

– directed acyclic graph (DAG) - (see next pages)

– control flow graph (CFG) - (see next pages)

– program dependence graph (PDG)

– static single assignment form (SSA)

– stack code

– three address code - (see next pages)

– hybrid combinations

Page 13: Intermediate Code Generation 2

Page 13

Compiler Design, WS 2005/2006

122(Abstract) Syntax Tree (1)

Abstract syntax tree:

• condensed form of parse tree (concrete syntax tree).• superficial nodes are omitted - more efficient.

• not unique (syntax tree - unlike parse tree - not definedby grammar).

E

EE

EE

+

a

a a

parse tree: (abstract) syntax tree:+

a

a a

Compiler Design, WS 2005/2006

123(Abstract) Syntax Tree (2)

if-statement:

parse trees: (abstract) syntax tree:if

if ( exp ) stmt else stmt

0 stmt1 stmt2

if

if ( exp ) stmt else-part

0 stmt1

stmt2

else stmt

if

0 stmt1 stmt2

Page 14: Intermediate Code Generation 2

Page 14

Compiler Design, WS 2005/2006

124Directed Acyclic Graph (DAG)

– Directed acyclic graph DAG:

• contraction of AST that avoids dulpication: identicalsubtrees are reused.

• exposes redundancies: changes (assignments, calls) ?

• smaller memory footprint

– Example: a x (a-b) + c x (a-b)

*

a -

a b

*

c -

a b

+

*

-

a b

*

c

+AST: DAG:

Compiler Design, WS 2005/2006

125CFG: Basic Blocks

– Program is broken up into set of basic blocks.

– Def. Basic Block: Maximum length sequence of instructionsI1,...In (n ≤ 1) with exactly one entry point (I1) and exactlyone exit point (In).(I.e. no branch instructions except perhaps the last instruction and no branch targets except perhaps at thefirst statement.)

Page 15: Intermediate Code Generation 2

Page 15

Compiler Design, WS 2005/2006

126CFG: Definition

– CFG models flow of control of a procedure:• each node represents a basic block,• each edge represents a potential flow of control.

– Def.: A control flow graph G is a triple G=(N,E,s,e), where (N,E) is a directed graph with nodes n ∈ N representing basic blocks and edges e ∈ E modeling the nondeterministic branching structure of G, s ∈ N is the entry node, e ∈ N is the exit node and there is a path from s to every node of G.

– Predecessorspreds (x) = {u(u,x) ∈ E}

– Successorssuccs (x) = {u(x,u) ∈ E}

– Start /end node propertiespreds (s) = ∅ (start node has no predecessors)succs (e) = ∅ (end node has no successors)

Compiler Design, WS 2005/2006

127Construction CFG: Nodes

Input: Sequence of instructions (linear IR)

Output: List of basic blocks.

Method.

1. Find set of leaders (first statements of basic blocks):

i. First stmt of a function is a leader.

ii. Any stmt that is target of some jump is a leader.

iii. Any stmt that follows some jump stmt is a leader.

2. For each leader: all stmts up to but not including next leader or end of function form its basic block.

Page 16: Intermediate Code Generation 2

Page 16

Compiler Design, WS 2005/2006

128Construction CFG: Edges

Input: List of basic blocks BBi of a procedure.

Output: CFG.

Method.

1. There is an edge from BBs to BBt , if

i. there is a jump from the last stmt of BBs to the first stmt of BBt, or

ii. BBt immediately follows BBs in the program and BBsdoes not end with unconditional jump.

2. If there are not unique entry and exit nodes, create such nodes.

Compiler Design, WS 2005/2006

129Example: Control Flow Graph and Basic Blocks

Procedure: SQRT(L) if L is a non-negative integer.

READ(L)N=0K=0M=1

L1: K=K+MC=K>LIF C GOTO L2

N=N+1N=M+2GOTO L1

L2: WRITE(N)

B1

B2

B3

B4

B1

B2

B3

B4

Page 17: Intermediate Code Generation 2

Page 17

Compiler Design, WS 2005/2006

130Three-Address Code

– Stands for a variety of representations.

– In general, instructions of the form:

res := arg1 op arg2• an operator op• at most two operands arg1 and arg2, and one result res.

– Common three-address statements:

• simple assignments

• conditional/unconditional jumps

• indexed assignment: x := y[i], x[i] := y

• address and pointer assignments: x := & y, x := *y

• some more complex operations ?

Compiler Design, WS 2005/2006

131Representing Three-Address Code

1. Quadruples:

– record structure with four fields: op, arg1, arg2, res.

2. Triples:

– refer to a temporary by the location of the instruction that computes it

– Requirement: Three-address instructions must be referenceable.

– Assumption: If a three-address code contains all three addresses, the target address is a temporary.

3. Indirect triples:

– listing pointers to triples rather than listing the triples themselves

Page 18: Intermediate Code Generation 2

Page 18

Compiler Design, WS 2005/2006

132Example: Quadruples and Triples

z := x – 2 x y

t5:=z(6)

t3t4-t5(5)

x:=t4(4)

t2t1∗t3(3)

y:=t2(2)

2:=t1(1)

(5)z:=(6)

(3)(4)-(5)

x:=(4)

(2)(1)∗(3)

y:=(2)

2:=(1)

(27)z:=(28)

(25)(26)-(27)

x:=(26)

(24)(23)∗(25)

y:=(24)

2:=(23)

(28)(6)

(27)(5)

(26)(4)

(25)(3)

(24)(2)

(23)(1)

Quadruples Triples

Indirect triples

Compiler Design, WS 2005/2006

133Pros/Cons of Three-Address Representations

– Quadruples:

• four fields, more memory

• easy to reorder

– Triples:

• three fields, memory efficient

• harder to reorder

– Indirect triples:

• about same amount of memory like quadruples

• equally easy to reorder than quadruples

Page 19: Intermediate Code Generation 2

Page 19

Compiler Design, WS 2005/2006

134Implementing Linear IRs: Quadruples

Array: Array of pointers: Linked list:

t5:=z

t3t4-t5

x:=t4

t2t1∗t3

y:=t2

2:=t1 2:=t1

y:=t2

t2t1∗t3

x:=t4

t3t4-t5

t5:=z

2:=t1

y:=t2

t2t1∗t3

x:=t4

t3t4-t5

t5:=z

Pros / Cons ?

Compiler Design, WS 2005/2006

135Addressing Array Elements

– Array storage layout:

• row-major order: row by row, rightmost subscript varies fastest (C)

• column-major order: column by column, leftmost subscript varies fastest (Fortran)

• indirection vectors (Java)

– Addressing:

• type A[low,high]

• A[i]

• base + (i-low) ∗ w

base … base address of A[low]w ……… sizeof(type)

Page 20: Intermediate Code Generation 2

Page 20

Compiler Design, WS 2005/2006

136Addressing Array Elements

A[i]

– base+(i-low)∗wA[i,j], row-major

– base+(i-low1)∗len2∗w++(j-low2)∗w

withlen2=high2-low2+1

A[i,j], column-major

– base+(j-low2)∗len1∗w++(i-low1)∗w

withlen1=high1-low1+1

A[i] - optimized

– i∗w+(base–low∗w)A[i,j] – optimized

– ((i∗len2)+j)∗w +(base-((low1∗len2)+low2∗w)

A[i,j], optimized

– ((j∗len1)+i)∗w +(base-((low2∗len1)+low1∗w)

Compiler Design, WS 2005/2006

137Addressing Array Elements

– Generalization for A[i1,i2, … , ik]:

– Second line: evaluated statically by compiler!

wlowlenlowlenlowlenlowbase

wilenilenileni

kk

kk

∗+++−++∗+++

))))((((

))))((((

33221

33221

LL

LL