compilers - wordpress.com · compilers mrs k.m.sanghavi snjb’s kbj coe, ... a classification for...

54
COMPILERS Mrs K.M.Sanghavi SNJB’s KBJ COE, Chandwad

Upload: buithu

Post on 30-Jul-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

COMPILERS

Mrs K.M.Sanghavi SNJB’s KBJ COE, Chandwad

INTRODUCTION

BASIC ELEMENTS RECOGNITION

SYNTACTIC UNIT RECOGNITION& MEANING INTERPRETATION

INTERMEDIATE REPRESENTATION

STORAGE ALLOCATION

CODE GENERATION

OPTIMIZATION

GENERAL MODEL OF COMPILER

PHASES OF COMPILER : Database, Tasks, Algorithm

INTRODUCTION

BASIC ELEMENTS RECOGNITION

SYNTACTIC UNIT RECOGNITION& MEANING INTERPRETATION

INTERMEDIATE REPRESENTATION

STORAGE ALLOCATION

CODE GENERATION

OPTIMIZATION

INTRODUCTION

“Compilation” Translation of a program written in a source language into a semantically equivalent program written in a target language.

Input

Compiler

Error messages

Source Program

Target Program

Output

ROLE OF COMPILER

Compiler

Error messages

Target Program

Output

Recognize Certain Strings as basic elements

Recognize Combination of elements as syntactic units and interpret the meaning

Allocate Storage and Assign Locations

Generate Appropriate Object Code

PHASES OF COMPILER

Compiler

Error messages

Target Program

Output

Lexical Analysis

Syntax Analysis

Semantic Analysis

Intermediate Code

Generation

Code Optimization

Code Generation

Phases

Error Handling

Symbol Table

M/c Independent

M/c dependent

Compiler

Error messages

Target Program

Output

RECOGNISING BASIC ELEMENTS

Parsing the Source Program Into Small Constitute Pieces i.e Scan input and Identify Tokens

This is known as Lexical Analysis

This removes White Space, New Line characters, …

The identified tokens (basic elements) are placed into symbol tables which are the used by other phases.

Discover Lexical Errors (e.g. invalid characters, improper identifiers) and Send Tokens to Parser

Compiler

Error messages

Target Program

Output

BASIC TERMINOLOGIES OF LEXICAL ANALYSIS

A classification for a common set of strings Examples Include <Identifier>, <number>, etc.

Tokens

The rules which characterize the set of strings for a token Recall File and OS Wildcards ([A-Z]*.*)

Pattern

Actual sequence of characters that matches pattern and is classified by a token Identifiers: x, count, name, etc…

Lexemes

Compiler

Error messages

Target Program

Output

RECOGNISING BASIC ELEMENTS

For example, the following code might result in the table given below

program foo(input,output); var x:integer; begin

readln(x); writeln(’value read =’,x) end

Lexeme Token Pattern

program program p, r, o, g, r, a, m

foo id(foo) letter followed by seq. of alphanumerics

( leftpar a left parenthesis

input input i,n,p,u,t

, comma comma

output output o,u,t,p,u,t

) rightpar a right parenthesis

; semicolon a semicolon

program foo(input,output);

Compiler

Error messages%

Target Program

Output

RECOGNISING BASIC ELEMENTS

var x : integer; begin

Lexeme Token Pattern

var var v,a,r

x id(x) letter followed by seq. of alphanumerics

: colon a colon

integer integer i,n,t,e,g,e,r

; semicolon a semicolon

begin begin b,e,g,i,n

Compiler

Error messages%

Target Program

Output

RECOGNISING BASIC ELEMENTS

readln(x); writeln(‘value x = ’,x)

Lexeme Token Pattern

readln readln r,e,a,d,l,n

( leftpar a left parenthesis

x id(x) letter followed by seq. of alphanumaerics

) rightpar a right parenthesis

; semicolon a semicolon

writeln writeln w,r,i,t,e,l,n

( leftpar a left parenthesis

Compiler

Error messages%

Target Program

Output

RECOGNISING BASIC ELEMENTS

readln(x); writeln(‘value read = ’,x) end . Lexeme Token Pattern

‘value read =‘ literal(‘va

lue read

=‘)

seq. of characters enclosed in quotes

, comma a comma

x id(x) letter followed by seq. of alphanumaerics

) rightpar a right parenthesis

end end e,n,d

. fullstop a fullstop

Compiler

Error messages

Target Program

Output

RECOGNISING SYNTACTIC UNITS

Recognize the phrases i.e syntax after getting tokens from Lexical Analyzer

This is known as Syntax Analysis

This is associated with construction of an intermediate form

The rules which specify the syntax of a source language is used to recognize the syntactic units.

Discover Syntactical Error and sometime also recover them

Compiler

Error messages

Target Program

Output

RECOGNISING SYNTACTIC UNITS

program foo(input,output); Valid procedure

var x:integer; Valid declaration

begin Valid begin statement readln(x); Valid function

writeln(’value read =’,x) Valid function end Valid end statement

Compiler

Error messages

Target Program

Output

INTEPRETING MEANINGS FROM

SYNTACTIC UNITS

Interpret the meaning of a construct

This is known as Semantic Analysis

This is also associated with construction of an intermediate form with meanings

This includes type checking of statements

Compiler

Error messages

Target Program

Output

INTERMEDIATE REPRESENTATION

Generation of Object Code

This is known as Intermediate Code Generation

It facilitates optimization of object code

Allows logical seperation between m/c dependent and independent phases

Compiler

Error messages

Target Program

Output

INTERMEDIATE REPRESENTATION OF…..

Arithmetic Statements

Parse Tree

Matrix

Non-Arithmetic

Matrix

Non-Executable (As such has no

intermediate form)

Identifier Tables

Compiler

Error messages

Target Program

Output

INTERMEDIATE

REPRESENTATION OF ARITHMETIC STATEMENTS

Rules for converting Arithmetic

statement into parse tree : Any variable is a terminal node of the tree For every operator, a binary tree is constructed with left node as operand1 and right node as operand2.

Compiler

Error messages

Target Program

Output

INTERMEDIATE

REPRESENTATION OF ARITHMETIC STATEMENTS

a = rate * (inital – final) + 2 * rate * (initial – final) is represented in parse tree form as follows :

=

a +

*

rate -

Initial final

*

*

2 rate

-

Initial final

However this method is not

practical

Compiler

Error messages

Target Program

Output

INTERMEDIATE

REPRESENTATION OF ARITHMETIC STATEMENTS

Second way of representing arithmetic

statements in intermediate form is ‘matrix’.

1

2

3

4

5

6

7

Operator Operand1 Operand2

- initial final

* rate 1

* 2 rate

- initial final

* 3 4

+ 2 5

= a 6

Compiler

Error messages

Target Program

Output

INTERMEDIATE

REPRESENTATION OF NON- ARITHMETIC STATEMENTS

Non-arithmetic statements are statements like : • do - while • return • if , if-else • while • goto etc.

These are represented by ‘matrix’

Compiler

Error messages

Target Program

Output

INTERMEDIATE

REPRESENTATION OF ARITHMETIC STATEMENTS

Operator Operand1 Operand2

return X

end

< a B

if 3

1

2

3

4

Compiler

Error messages

Target Program

Output

INTERMEDIATE

REPRESENTATION OF NON- ARITHMETIC STATEMENTS

Non-executable statements are statements like : • declare • dimension • include

These have no intermediate forms, however their information is stored into tables.

Compiler

Error messages

Target Program

Output

STORAGE ALLOCATION

The semantic analysis phase constructs the updated entries of tokens into symbol table for identifiers along with their type, name ,base etc.

The storage allocation routine then scans the symbol table and assigns a location to each identifier.

For e.g start with location 0 ,4 and so on.

Compiler

Error messages

Target Program

Output

STORAGE ALLOCATION

Name Base Type Location

a binary int 0

rate binary Float 2

start binary int 6

initial binary int 8

Compiler

Error messages

Target Program

Output

CODE GENERATION

One scheme to generate code is associating each type of matrix operation with the object code.

The matrix is then scanned and code is generated for each entry using the table

Compiler

Code Production Table

Target Program

Output

CODE GENERATION

Sr.N

o Operator Target Code

1 - LOAD 1,&OPERAND1

SUB 1,&OPERAND2

STORE 1 , M& N

2 * LOAD 1,&OPERAND1

MUL 0,&OPERAND2

STORE 1 , M& N

3 + LOAD 1,&OPERAND1

ADD 1,&OPERAND2

STORE 1 , M& N

4 = LOAD 1, & OPERAND2

STORE 1,&OPERAND1

Compiler

Error messages

Target Program

Output

CODE GENERATION

As shown previously , a = rate * (inital – final) + 2 * rate * (initial – final) is represented in matrix form as follows .The code is then generated using the code production table.

Operator Operand1 Operand2

- initial final

* rate 1

* 2 rate

- initial final

LOAD 1,initial

SUB 1,final

STORE 1 , M1

LOAD 1,rate

MUL 0, M1

STORE 1 , M2

LOAD 1,#2

MUL 0, rate

STORE 1 , M3

LOAD 1, initial

SUB 1, final

STORE 1 , M4

Compiler Target Program

Output

CODE GENERATION

Operator Operand1 Operand2

* 3 4

+ 2 5

= a 6

LOAD 1,M3

MUL 0,M4

STORE 1 , M5

LOAD 1,M2

ADD 1, M5

STORE 1 , M6

LOAD 1,M6

STORE 1 , a

Compiler Target Program

Output

CODE OPTIMIZATION

Is it good to directly generate code from the matrix

• As it may give rise to redundant code as in Line 1 and 4 above

Is the best use of machine done

• Line 12 and 14 shows M4 is not used further s unnecessary store arises

Can the machine code be generated using other techniques.

This gives rise to Optimization

Compiler Target Program

Output

CODE OPTMIZATION

First issue refers to m/c independent optimization

• Optimality of matrix

Second refers to m/c dependent optimization

• Optimality of m/c code

Compiler Target Program

Output

M/C INDEPENDENT CODE OPTIMIZATION

Using Common Subexpression Elimination

• Remove redundant code as in Line 1 and 4 above and use 1 for 4.

Constant Folding : i.e compile time evaluation of operations whose operands are constants

• For e.g x = 2+ 4 will be x = 6

Code Motion

• Moving the code of computations of loops outside

Compiler

Error messages

Target Program

Output

COMMON SUBEXPRESSION

Replace 4 with 1 and delete 4th statement

1

2

3

4

5

6

7

Operator Operand1 Operand2

- initial final

* rate 1

* 2 rate

- initial final

* 3 4

+ 2 5

= a 6

1

Compiler Target Program

Output

CODE MOTION

For e.g

a=1

while(a+3<=10) { cout<<“Hello”;

}

For e.g

a=1

b = a+ 3

while(b<=10) { cout<<“Hello”;

}

Compiler Target Program

Output

M/C DEPENDENT CODE OPTIMIZATION

Using Proper M/c Instructions like instead of MOV a,R

ADD R,1

MOV R,a , we can just use INC a

Making Efficient use of registers instead of temporaries and hence reducing the number of load and stores

Compiler Target Program

Output

GENERAL MODEL OF A COMPILER

1) Lexical

Analysis

2) Syntax

Analysis

3)Interpretation

4) M/c

independent

optimization

5) Storage

Assignment

6) Code

Selection

7) Assembly

and output

Source Code

Uniform

Symbol Table

Matrix

Optimized

Matrix

Assembly

Code

Relocatable

Machine Code

Terminal Table

Reductions

Identifier

Table Literal

Table

Compiler Target Program

Output

GENERAL MODEL OF A COMPILER

Sr.No Database Description

1 Source Code Any Program

2 Uniform Symbol Table Consists of full or partial list of tokens. Created by lexical analysis

3 Terminal Table Permanent table in which list of keywords and special symbols

4 Identifier table Contains all variables in the program and temporary storage and any information. Created by lexical analysis and modified by interpretation

Compiler Target Program

Output

GENERAL MODEL OF A COMPILER

Sr.No Database Description

5 Literal table Contains all constants Created by lexical analysis and referenced by interpretation

6 Reductions Permanent table of decision rules in the form of patterns for matching with Uniform symbol table to discover the syntax

7 Matrix Intermediate form of program created by action routines, optimized and then used for code generation

8 Code Productions Permanent table of definitions.

Compiler Target Program

Output

Phases of Compiler : Lexical Phase

Tasks Parse source program into tokens

Build literal and identifier table

Build Symbol Table

Databases Source Program, Terminal Table , Literal Table, Identifier Table, Uniform Symbol Table

Algorithm I/p string is separated into tokens by break characters. Consecutive non-break characters are accumulated into tokens.

Compiler Target Program

Output

Phases of Compiler : Lexical Phase

Database Description

Terminal Table

Literal Table

Identifier Table

Symbol Table

Symbol Indicator Precedence

Literal Base Scale Precision Other information

Address

Name Data Attributes

Address

Table Index

Compiler Target Program

Output

Phases of Compiler : Lexical Phase : Algorithm

For an identifier it checks if the entry is already in symbol table. If not new entry is made.

If match is found then Analyzer creates symbol table entry as ‘TRM’ , otherwise checks if literal (‘LIT’) or identifier(IDN)

Compares the tokens against the entries in terminal table.

Consecutive non-break characters are accumulated into tokens.

I/p string is separated into tokens

Compiler Target Program

Output

Phases of Compiler : Syntax Phase

Tasks Recognize major constructs of the language

Call appropriate action routine to generate intermediate form

Interpreter for reductions

Databases Uniform Symbol Table, Reduction Table

Algorithm I/p buffer is checked with stack and reductions are performed according to reduction table entry

Compiler Target Program

Output

Phases of Compiler : Syntax Analysis Phase

Database Description

Reduction Table

Syntax Rules : Label: Top of Stack / Action Routine / New Top of Stack / Reduction

Uniform Symbol Table

Table Index

Compiler Target Program

Output

Phases of Compiler : Syntax Analysis

Phase : Conventions

Label Top of Stack

Action Routine

Reduction

New Top of Stack

Target Program

Output

Phases of Compiler : Syntax Analysis

Phase : Conventions

Sm

Xm

Sm-1

Xm-1

.

.

S1

X1

S0

a1 ... ai ... an $

LR Parsing Algorithm

stack

input

output

Action Table

terminals and $

s t four different a actions t e s

Goto Table

non-terminal

s t each item is a a state number t e s

Compiler Target Program

Output

Phases of Compiler : Syntax Analysis Phase : Algorithm

For an identifier it checks if the entry is already in symbol table. If not new entry is made.

When control returns to syntax analyzer it modifies the Top of Stack to agree with New Top of Stack field

When Match is found the action routines specified in the action fields are executed in order from left to right

Reduction are tested consecutively for match between Top of Stack and Input Buffer until match is found

(SLR) Parsing Tables for Expression Grammar

state id + * ( ) $ E T F

0 s5 s4 1 2 3

1 s6 acc

2 r2 s7 r2 r2

3 r4 r4 r4 r4

4 s5 s4 8 2 3

5 r6 r6 r6 r6

6 s5 s4 9 3

7 s5 s4 10

8 s6 s11

9 r1 s7 r1 r1

10 r3 r3 r3 r3

11 r5 r5 r5 r5

Action Table Goto Table

1) E E+T

2) E T

3) T T*F

4) T F

5) F (E)

6) F id

Actions of A (S)LR-Parser -- Example

stack input action output

0 id*id+id$ shift 5

0id5 *id+id$ reduce by Fid Fid

0F3 *id+id$ reduce by TF TF

0T2 *id+id$ shift 7

0T2*7 id+id$ shift 5

0T2*7id5 +id$ reduce by Fid Fid

0T2*7F10 +id$ reduce by TT*F TT*F

0T2 +id$ reduce by ET ET

0E1 +id$ shift 6

0E1+6 id$ shift 5

0E1+6id5 $ reduce by Fid Fid

0E1+6F3 $ reduce by TF TF

0E1+6T9 $ reduce by EE+T EE+T

0E1 $ accept

Compiler Target Program

Output

Phases of Compiler : Interpretation Phase

A Collection of routine that are called when a construct is recognized in the syntactic phase

It creates an intermediate form of the source program

Adds information to the identifier table

Compiler Target Program

Output

Phases of Compiler : Interpretation Phase : Database

Database Description

Temporary Storage Table

Matrix

Identifier Table

Uniform Symbol Table

Previou

s Info Storage

Class Array

Boun

d

Structure Info

Literal Value

Other information

Address

Table Index

Operator Operand 1 Operand 2

Attribute of Temporary

Computations : Data type Base Scal

e Precis

ion Storage

Class Other

information

Address

Compiler Target Program

Output

Phases of Compiler : Optimization Phase : Database

Database Description

Matrix

Identifier Table

Literal Table

Previou

s Info Storage

Class Array

Boun

d

Structure Info

Literal Value

Other information

Address

Operator Operand 1 Operand 2 Forward Pointer

Backward Pointer

Literal Base Scale Precision Other information

Address

Global: Common Subexpression Elimination (CSE)

r3 = r4 / r7

r2 = r2 + 1

r3 = r3 + 1 r1 = r3 * 7

r5 = r2 * r6

r8 = r4 / r7

r9 = r3 * 7

r1 = r2 * r6 Goal: eliminate recomputations of an expression

Rules: 1. X and Y have the same

opcode and X dominates Y 2. src(X) = src(Y) for all srcs 3. For all srcs, no def of a src on

any path between X and Y (excluding Y)

4. Insert rx = dest(X) immediately after X for new register rx

5. Replace Y with move dest(Y) = rx

r8 = r10

r10 = r3

Phases of Compiler : Optimization Phase : CSE Algorithm

Local: Strength Reduction

r7 = 5

r5 = 2 * r4

r6 = r4 * 4

r6 = r4 << 2

r5 = r4 + r4

Goal: replace expensive operations with cheaper ones

Rules (common): 1. X is an multiplication

operation where src1(X) or src2(X) is a const 2k integer literal

2. Change X by using shift operation

3. For k=1 can use add

Phases of Compiler : Optimization Phase : LSR Algorithm

Global: Code Motion

r4 = M[r5]

r7 = r4 * 3

r8 = r2 + 1

r7 = r8 * r4 r3 = r2 + 1

r1 = r1 + r7

M[r1] = r3

r1 = 0 preheader

header

Goal: move loop-invariant computations to preheader

Rules: 1. Operation X in block that

dominates all exit blocks 2. X is the only operation to

modify dest(X) in loop body 3. All srcs of X have no defs in

any of the basic blocks in the loop body

4. Move X to end of preheader 5. Note 1: if one src of X is a

memory load, need to check for stores in loop body

6. Note 2: X must be movable and not cause exceptions

r4 = M[r5]

Phases of Compiler : Optimization Phase : LSR Algorithm

For More Details : Visit our Blog

Mahesh Sanghavi

Kainjan Sanghavi https://kainjan1.wordpress.com/

Thank You

https://maheshsanghavi.wordpress.com/