advc3

SYSTEM PROGRAMMINGIntroduction

The designer expresses the ideas in terms related to application domains of the software. To implement these ideas, their description/ideas have to be converted in terms related to execution domain of computer system.

Application Execution Domain Domain

|____________| Semantic Gap

Semantic Gap has many consequences, some of the important are large development times, large development efforts, poor quality of software etc.

Programming Language (PL) tackles these issues. Software implementation using a PL introduces a new domain, PL domain.

Application PL Execution Domain Domain Domain

|____________|_________| Specification Execution

Gap Gap

Language Processor: A Language processor is software which bridges a specification or execution gap.

Source Program (SP): Program form input to a language processor. Language which is use to write this program is Source Language. (E.g. C, C++)

Target Program (TP): Output from language processor. Language which is used tow write this is known as target language. (E.g. Machine language, Assembly)

Language Translator: Bridges execution gap to machine language. E.g. Compiler, assembler

De-compiler / De-assembler/ De-translator: Bridges same execution gap but in reverse order.

Preprocessor: It is a language processor which bridges execution gap but it is not language translator.

C++ C++ C++/CProgram -- Pre-processor - Program | Errors----------------------------------------------------------------------------C++ C++ MachineProgram -- Translator - Language Program | Errors

Interpreters: An interpreter is a language processor which bridges an execution gap without generating a machine language program.

Language Types

Problem Oriented Language: A classical solution is to develop a PL such that PL domain is very close to or identical to application domain. PL features now directly aspects of application domain. This leads to a small specification gap.

Such PL can be used for specific applications hence they are called problem oriented languages. They have large execution gap.

Application PL Execution Domain Domain Domain

|_______|________________| Specification Execution Gap Gap

Procedure Oriented Language: Provide general purpose facilities required in most application domains. Such a language is independent of specific application domains and result in large specification gap. E.g. c, c++

Program Translation:

Program translation model bridges the execution gap by translating a program written in a PL, called source program (SP), in to equivalent program in the machine language or assembly language of the computer system, called target program.

SourceProgram----> ---- ---Target Program (TP)(SP) | || |

Errors DB Data

Program Interpretation: Interpreter reads source program and stores in its memory. During interpretation it takes a source statement; determine its meaning to implement it. Execution Cycle:

1. Fetch the instruction2. Decode the instruction and determine operation to be performed3. Fetch the operands4. Execute the instruction

Interpretation Cycle:1. Fetch the statement2. Analyze the statement and determine its meaning3. Fetch operands4. Execute meaning

Translator m/c. language program

Language Processing

Language Processing = Analysis of SP + Synthesis of TP

We refer to collection of language processor components engaged in analysis of a source program as the analysis phase of language processor.

Components engaged in synthesizing a target program is known as Synthesis phase.

Analysis Phase:

1. Lexical analyzer – governs formation of a valid lexical units2. Syntax analyzer – which governs formation of valid statements in source

language.3. Semantic analyzer - which associate meaning with valid statements of the

language.

Synthesis Phase:

Synthesis phase is concerned with the construction of target language statements which have the same meaning as a source statement.

It consists of:Creation of data structure in the target programGeneration of target code

Phases and passes of Language Processor:

Analysis of source program can not be immediately followed by synthesis of target statement because of two reasons:

1. Forward references2. Memory requirements and language processor organization

Language Processor

AnalysisPhase

Synthesis Phase

SP TP

Errors Errors

Forward Reference: A forward reference of a program entity is a reference to the entity which precedes its definition in the program.

e.g. Interest = (p * r * n) / 100;

…..….

float Interest;

Reference to interest in the assignment statement is a forward reference because declaration of interest occurs later in the program.

Data Type of interest is not known while processing the assignment statement, Correct code can not be generated for it in a statement by statement manner.

This leads to multi pass model of language processing.

Language Processor Pass: A language processor pass is the processing of every statement in a source program, or its equivalent representation, to perform a language processing function (or a set of functions.)

e.g.

Pass I: Perform analysis of the source program and note relevant informationPass II: Perform synthesis of target program.

Language processor performs certain processing more than once. In pass I, it analyses the source program to note the type information. In pass II, it again analyze source program to generate target code using the type information noted in Pass I.

Intermediate representation: an intermediate representation (IR) is a representation of a source program which reflects the effect of some, but not all analysis and synthesis tasks performed during language processing.

Language Processor

AnalysisPhase

Synthesis Phase

S.P. T.P.

Intermediate Repn. (IR)

First pass performs analysis of source program and reflects(update) results in intermediate representation. The second pass reads and analyze IR, instead of source program to perform synthesis of target program. This avoid repeated processing of source program.

First pass is concerned exclusively with source language issues. It is also called Front End.

Second Pass is concerned with program synthesis for a specific target language.It is called backend of language processor.

Front-end and back-end of a language processor, need to co-exist in memory.

AssemblersAssembly language is low-level programming language, it is machine dependent. [LABEL] <Operation-Code> <Operand1>, [<Operand2> …]

Features of assembly language:1. Mnemonic operation code: It keeps us free from remembering numeric op

codes. It also helpful to find out mistakes in writing operation code.

2. Symbolic Operands: Are Symbolic names given to data items or instructions. Assembler binds this symbolic name to memory locations.

3. Data declarations: We can declare data items and there by we don’t have to convert data items into machine format.

Instruction Op-Code

Mnemonic Op-Code

00 STOP01 ADD02 SUB03 MULT04 MOVER05 MOVEM06 COMP07 BC 08 DIV09 READ10 PRINT

START 101 READ N MOVER BX , ONE MOVEM BX, TERMLOOP MULT BX,TERM MOVER CX,TERM ADD CX, ONE MOVEM CX, TERM COMP CX,N BC LE, LOOP MOVEM BX, RESULT PRINT RESULT STOPN DS 1RESULT DS 1ONE DC 1TERM DS 1 END

101 09 0 113102 04 2 115103 05 2 116104 03 2 116105 04 3 116106 01 3 115107 05 3 116108 06 3 113109 07 2 104 110 05 2 114111 10 0 114112 113 114115 0 0 001116

Assembly language Statements:1. Imperative Statements (e.g ADD, MULT)2. Declaration Statements (e.g. DS, DC)3. Assembler Directives (e.g START, STOP)

Assembler Phases:

1. Analysis Phase2. Synthesis Phase

1. Analysis Phase

Primary function is generation of symbol table. Need to associate address with each of program elements (e.g. instructions , data) Use of LC (location counter) for memory allocation. Location Counter is used to contain address of next instruction of

target program. Checks validity of Mnemonic Op-code It makes entry in symbol table when new label is found in statement It finds length of instructions and adds it to LC. So LC would point to

address of next instruction This is also called LC Processing

Inst Op-Code LengthADD 01 2SUB 02 2 Mnemonics Table

| | SP-- ------------------- -> TP

| |

LOOP 104N 113

Symbol Table

Synthesis Phase: Obtain the machine Opcode corresponding to the mnemonic from

the Mnemonic table

Obtain address of a memory operand from Symbol table

Synthesize a machine instruction or machine form of a constant as needed.

Analysis Phase

SynthesisPhase

PASS Structure of Assemblers

Two Pass Translation

Two pass translation of assembly language program can handle forward reference easily.

LC Processing is done in first pass and symbols defined in table are entered in Symbol Table.

Second pass synthesize target program using Symbol Table. First pass does analysis of source program while second does synthesis of

target program First pass also construct IR from source program.

Single Pass Translation

LC processing and construction of symbol table is done same way as two pass

Forward reference is tacked using process called BACKPATCHING

Operand field of an instruction containing forward reference is left blank initially.

Address of forward reference is put when definition is encountered

E.g. MOVER BX, ONE .. Here ONE is forward reference so second oprand address is left blank initially

It makes forward reference entery to TII (table of Incomplete Insturctions). E.g (101, ONE)

When END statement is processed, the symbol table would contain address of all symbols and TII would contain information about forward references

Assembler now processes each entry of TII to complete concerned instruction.

Pass I of Assembler

Pass I is using three data structure OPTAB – Mnemonic table/ operand table, SYMTAB Symbol table LITTAB – Literals MnemonicOp-code class info symbol address lengthADD IS(Imparative) (04,1) DS DL(Declarative) R#7START AD(Assembler

Directive)R#11

OPTAB SYMTABLiteral address literal no

LITTAB POOLTAB

OPTAB contains op-code, class and mnemonic info. Class indicates if the op-code is Imperative Statements (IS), a declaration statement (DL) or assembler directive(ID)

Literal table contains entry for each literals

Processing of assembly statement begins with processing of label, if it contains a symbol, the symbol and LC is entered into SYMBTAB.

Then assembler interprets OPTAB entry for mnemonic information, class is checked if it is Imperative statements, the length of machine instruction is added to LC and entered into SYMTAB.

If it is declarative statement or assembler directive, the routine mentioned in mnemonic info is called to perform. For e.g. in case of DS routing no 7 is called, which finds amount of memory required by this statement and updates LC and symbol table entry.

First pass uses LITTAB to collect all literals used in a program. Awareness of different literal pools is maintained using auxiliary pool table. This table contains literal number of starting literal of each literal pool.

Loop 202 1Next 214 1Last 216 1A 217 1

1 =’abc’2 =’2’3 :

:

#1#3--

PASS 1 ALGORITHM:

1. loc_cntr:=0 (default value), pooltab_ptr:=1;POOLTAB[1]=1; littab_ptr:=1;

2. While next statement is not ENDa. If label is present?

this_label = symbol in label field Enter (label, loc_cntr) in SYMTAB

b. If LTORG statement is present Allocate Literals, Update LITTAB and POOLTAB. Updated Loc_cntr. Increment pooltab_ptr by 1. (pooltab_ptr=pooltab_ptr+1)

c. If START or ORIGIN statement then loc_cntr = value of operand field.

d. If EQU statement then This_addr = value of address Correct SYMTAB Entry.e. If DS/DC - declaration statement then

Calculate the storage SIZE required by DC/DS Update loc_cntr Generate IC for statement

f. If it is imperative statement then code = machine OP-code from OPTAB Update loc_cntr by length of instruction Check for Operand , if Operand is literal then Enter it into literal table Else Enter SYMBOL in SYMTAB (symbol table) Generate IC

3. END statement is encountered a. Perform step 2b b. Generate IC C. Go to PASS II

PASS II ALGORITHM:

1. code_area_address= address if code area; pooltab_ptr=1; loc_cntr=0

2. While next statement is not ENDa. Clear machine_code bufferb. If label is present?

this_label = symbol in label field Enter (label, loc_cntr) in SYMTAB

c. If LTORG statement is present Process Literals similar to processing of constant in DC. Assemble literals in machine code buffer. Size = size of memory area required for literals Increment pooltab_ptr by 1. (pooltab_ptr=pooltab_ptr+1)

d. If START or ORIGIN statement then loc_cntr = value of operand field. size = 0

e. If DS/DC - declaration statement then if dc statement then assemble the constant in machine_code_buffer size = size of memory required by DC/DS

f. If it is imperative statement then Get Operand address from SYMTAB or LITTAB Assemble instruction in machine code buffer size = size of instruction.

g. If size <> 0 Move contenets fo machine_code_buffer to address code_area_address + loc_cntr loc_cntr = loc_cntr + size

3. END statement is encountered a. Perform step 2b and 2F b. write code_area into Output File.

A SINGLE PASS ASSEMBLER FOR IBM PC

Single pass assembler for the intel 8088 processor used in the IBM PC. A single pass assembler for the intel 8088 processor used in the IBM PC.

The architecture of Intel 8088The intel 8088 microprocessor supports 8 and 16 bit arithmetic, and also provides special instructions for string manipulation.

The CPU contains the following features

• Data registers AX, BX, CX and DX• Index registers SI and DI• Stack pointer registers BP and SP• Segment registers Code, Stack, Data and Extra.

Data RegistersAH ALBH BLCH CLDH DL

Stack RegistersBPSP

Index RegistersSIDI

Segment RegistersCSDSSSES

The Intel 8088 provides addressing capability for 1 MB of primary memory.

The memory is used to store three components of a program, program code, data and stack. The Code, Stack and Data segment registers are used to contain the start addresses of these three components.

Address of a particular instruction is obtained by addition Segment Starting address to logical 16 bit address offset. Each segment register is of 16 bit and thereby segment size can not be higher than 2^16 = 64 KB.

A large program may contain more than one segment.

---------------------CODE SGMNT--------------------

---------------------DATA SGMNT--------------------

RAM (1 MB)-----------------

8088 Addressing Modes:

Addressing Mode Example Description

immediate MOV SUM, 1234H data = 1234H

register MOV SUM, AX AX contains the data

direct MOV SUM, [1234H] data at 1234H

register indirect MOV SUM, [BX] data at location of BX value

MOV SUM, CS : [BX]

base addressing MOV SUM, 12H [ BX] data at = 12 H + (BX)

index MOV SUM, 34H [SI] data = 34H + SI

base and index MOV SUM,56H [ SI] [BX] 56h + (si) + (bx)

Intel 8088 instructions:Arithmetic Instructions:

Assembly Statements ADD AL,BLADD AL, 12H (SI)ADD AX, 3456 H

Control Transfer Instructions

Branching Example

Move 80 bytes from source index address to destination address.

MOV SI, 100hMOV DI, 200hMOV CX, 50hCLDREP MOVSB

Assembler Directives

Declarations A DB 25 B DW ? C DD 6 DUP(0)

EQU & Purge XYZ DB ?ABC EQU XYZ ; ABC is new name for XYZ….PURGE ABC ; ABC <> XYZABC EQU 25

SEGMENT , ENDS Segment and ENDS is used to declare starting and ending of segment

CODESG SEGMENT…

instructions…CODESG ENDSDATASG SEGMENT

DATASG ENDS

ASSUME ASSUME SEGMNT_REG :SEGMENT_NAMEe.gASSUME DS : DATASG

PROC AND ENDP

delimit the body of procedure

FACTORIAL PROC

FACTORIAL ENDP

Example Assembly Language Program

Sr No. Statement Offset

001 CODE SEGMENT002 ASSUME CS:CODE, DS:DATA003 MOV AX, DATA 0000004 MOV DS, AX 0003005 MOV CX, LENGTH STRNG 0005006 MOV COUNT,0000 0008007 MOV SI, OFFSET STRNG 0011

008 ASSUME ES:DATA, DS:NOTHING

009 MOV AX, DATA 0014010 MOV ES, AX 0017011 COMP:CMP [ SI],'A' 0019012 JNE NEXT 0022013 MOV COUNT, 1 0024014 NEXT: INC SI 0027015 DEC CX 0029016 JNE COMP 0030017 CODE ENDS

018 DATA SEGMENT019 ORG 1020 COUNT DB 0001021 STRNG DW 50 DUP(?) 0002022 DATA ENDS

023 END

Forward Reference:

A symbolic name may be forward referenced in a variety of ways. When used as a data operand in a statement, its assembly is straightforward. An entry can be made in the table of incomplete instructions (TII)

This entry would identify the bytes in code where the address of the referenced symbol should be put. When the symbol’s definition is encountered, this entry would be analyzed to complete the instruction.

When the definition of COUNT is encountered in statement 20, information concern ing these forward references can be found in the table of incomplete instructions

Segment Registers table:

ASSUME statement indicates that a segment register contains the base address of a segment. The assembler represents this information by a pair of the form (segment register segment name).

This information can be stored in a segment registers table (SRTAB). SRTAB is updated on processing an ASSUME statement. For processing the reference to a symbol symb in an assembly statement, the assembler accesses the symbol table entry of symb and finds (segsymb, offset where segsymb is the name of the symbol containing the definition of symb.

It uses the information in SRTAB to find the register which contains segsymb. Let it be register r. It now synthesizes the pair (r, offsetsymb). This pair is put in the address field of the target instruction.

A new SRTAB is created while processing an ASSUME statement. This SRTAB differs from the old SRTAB only in the enthes for the segment registers named in the ASSUME statement. Since many SRTAB’s exist at any time, an array named SRTAWARRAY is used to store the SRTAB’s. This array is indexed using a counter srta b_no.

Forward Reference Table

Instead of TIl, a forward reference table (FRT) is used. Each entry of FRT contains the following entries:

(a) Address of the instruction whose operand field contains the forward ref erence

(b) Symbol to which forward reference is made

(c) Kind of reference (e.g. T: analytic operator TYPE, D : data address, S self relative address, L : length, F: offset, etc.)

(d) Number of the SRTAB to be used for assembling the reference.

LC Processing

LC processing in this algorithm differs from LC processing in the first pass of a two pass assembler (see Algorithm 4.1) in one significant respect. In Intel 8088, the unit for memory allocation is a byte, however certain entities require their starting byte to be aligned on specific boundaries in the address space.

For example, a word requires alignment on an even boundary, i.e. it must have an even start address. Such alignment requirements may force some bytes to be left unused during memory allo cation. Hence while processing DB statements and imperatives, assembler first aligns

SRTAB

Two SRTAB’s would be built for the program of Fig. 4.25. SRTAB#1 con tains the pairs (CS, CODE) and (DS, DATA) while SRTAB#2 contains the pairs (CS, CODE) and (ES, DATA). While processing statement 6, SRTAB#1 is the current SRTAB. Hence the FRT entry for this statement is (008, COUNT, D, SRTAB#l).

Similarly the FRT entry for statement 13 is (024, COUNT, D, SRTAB#2). These entries are processed on encountering the definition of COUNT, giving the address pairs (DS, 001) and (ES, 001). (Note that FRT entries would also exist for statements 5, 7 and 12. However, none of them require the use of a base register.)

Clear ERRTAB

Read Next Statement

Look up mnemonic in MOT and call appropriate routine

Align LCAllocate StorageConvert Constant

Seal presentSRTAB fromnew SRTAB

Obtain M/COp-code, dataalignment requirements

Manipulate LC

EvaluateOperandExpressions

Operand Symbol?

Enter in CRT

Literal Operand?

Alreadydefined?

Enter SRTAB #,instn, address, assignment code & stat# in FRT

Check Alignment &addressability, Generate seg. Base, offset for Symbol

AssembleInstruction

Put into target code buffer

Lable present?

Enter in SYMTAB,CRT

D B BC

N

Y

YD

E

Y

Update LC

ProcessForward Reference

List the stmt. Report errors .Clear ERRTAB

END?Report undefined Symbols and FRT

Process cross-ref Listing

STOP

C

B

A

A

1. code_area_address := address of code_area;

srtab_no= 1;

LC :=O;

stmt_no := 1;

SYMTAB_segment_entry := 0;

Clear ERRTAB, SRTABARRAY.

2. While next statement is not an END statement

(a) Clear machine_code_buffer

(b) if label is present then this_label := symbol in label field;

(c) If an EQU statement

(i) this_address := value of operand expression;

(ii) Make an entry for this Label in SYMTAB with offset := this_addr

Defined =‘yes’;

Owner_segment := owner_segment of operand symbol;

Source_stmt_no : stmt_no;

(iii) Enter stmt_no in the CRT list of the label in the operand field.

(iv) Process forward references to this_label;

(v) size := 0;

(d) If an ASSUME statement

(i) Copy the SRTAB in SRTABARRAY [srtab_no] into SRTAB ARRAY [srtab_no + 1]

(ii) srtabno := srtab_no+ 1;

(iii) this_register := register mentioned in the statement.

(iv) this_segment := entry number of SYMTAB entry of the segment appearing in operand field.

(v) Make the entry (this_register, this_segment) in SRTAB_ARRAY [srtab_no] (This overwrites an existing entry for this_register.)

(vi) size := 0;

(e) If a SEGMENT statement

(i) Make an entry for this_label in SYMTAB.(ii) Set segment name ? true;(iii) SYMTABsegrnenLentry entry no. in SYMTAB;(iv) LC:= 0;(v) size:= 0;

(f) If an ENDS statement then

SYMTAB_segment_entry := 0;

(g) If a declaration statement

(i) Align LC according to the specification in the operand field.

(ii) Assemble the constant(s), if any, in the machine_code_buffer.

(iii) size := size of memory area required;

(h) If an imperative statement

(i) If operand is a symbol symb then

enter stmtno in CRT list of sytnb.

(ii) If operand symbol is already defined then

Check its alignment and addressibility.

Generate the address specification (segment register, els

offset) for the symbol using its SYMTAB entry and e

SRTAB ARRAY tjsrtab

Make an entry for symbol in SYMTAB.

Defined ‘no’;

Enter (srtab LC, usage code, stmLno) in FRT. (iii) Assemble instruction in machine_code_buffer.

(iv) size := size of the instruction;

(i) If size 0 then

(i) If label is present then

Make an entry for this _label in SYMTAB. ownersegment = SYMTAB.segment_entry; Defined := ‘yes’;

offset := LC;

source_stmt_# := stmt_no;

(ii) Move contents of machine_code to the address code _address;

(iii) code_area_address code_area_address + size;

(iv) Process forward references to the symbol. Check for alignment a addressability errors. Enter errors in ERRTAB.

(v) List the statement with errors contained in ERRTAB. (vi) Clear ERRTAB.

Compilers

Aspect of Compilation

Compiler bridges the semantic gap between PL domain and Execution domain. Two aspect of compilation process are:

1. Generate Code to implement meaning of source in execution domain.2. Provide diagnostic for violation of PL semantics in source code.

PL Features:

1. Data Type 2. Data Structures 3. Scope Rules 4. Control Structure.

1. Data Type: A data type is specification of a. Legal value for variable of the type and b. legal operations on legal value of the type.

E.g. int i, j; int i, x [10]; j = i + 2; x = i + 5;

2. Data Structures:

Language permits declaration and use of data structures like arrays, stack, records and link list etc.

To compile reference to an element or structure, compiler must develop memory mapping strategy to access memory allocated to the element.

e.g. arrays , structures etc.

3. Scope Rules:Scope rules determines accessibility of variables declared in different block of a program. Scope of a program entity is that part of program where entity is accessible.e.g.main(){ int i = 2; { int i = 5; }

}

4.Control Structures:Control structure of a language is the collection of language features for altering flow of control during program execution. This includes branching, conditional execution, iteration control and procedure calls or functions.

Memory Allocation Strategy of Compilers

it involves three tasks:

1. Determine amount of memory required to represent the value of data item2. Use an appropriate memory allocation model to implement the lifetime and scope3. Determine appropriate memory mapping to access the value in non scalar data item e.g. Value in an array element.

First task is accomplished during semantic analysis of data declaration statements.

Memory allocation models:

1. Static and Dynamic memory allocation2. Memory allocation in Block Structured Languages

Definition: Memory binding is an association between the memory address of data item and the address of memory area.

Memory allocation is procedure used to perform memory binding. The binding ceases to exist when memory is de-allocated.

Memory binding can be static or dynamic in nature.

In static memory allocation, memory is allocated to a variable before the execution of program begins. Static memory allocation is typically performed during compilation. No memory allocation or de-allocation is performed during execution.So variables remains permanently allocated; allocation to a variable exists even if the program unit in which it is defined is not active.

In dynamic memory allocation, memory bindings are established and destroyed during execution of program.

Typical example for static memory allocation is FORTRAN. Dynamic memory allocation is used in PL/I, Pascal, Ada,C etc.

Dynamic memory allocation has two types – 1.Automatic memory allocation and 2.Program controlled allocation.

Automatic allocation: Memory binding is performed at execution start time of program unit.

Program controlled allocation: Memory binding is performed during execution of a program unit.

Automatic dynamic allocation, memory is allocated to variable declared in a program unit, when program unit is entered during execution and de-allocated when program unit is exited.

Program controlled dynamic allocation: A program can allocate and de-allocate memory at arbitrary point during its execution.

In dynamic memory allocation, address of memory area of variables can not be determined at time of compilation.

Dynamic memory allocation is implemented using stack and heaps, thus all access is done through pointer and will be slower than static memory allocation.

Automatic dynamic memory allocation is done using stack (LIFO) structure.

Program controlled dynamic memory allocation is implemented using a heap.

Dynamic memory allocation advantages:

1. Recursion can be implemented easily because memory is allocated when a program unit is entered during execution.

2. Dynamic memory allocation can also support data structures whose size are determined dynamically. E.g. array x[m,n];

Code A Code A Code A Code AData A Code B Code B Code BCode B Code C Code C Code CData B Data A Data A Data ACode C ////////////

///////////////////////

Data B Data CData C ///////////////

////////////////////////////////////////////////

Static Dynamic Memory Allocation

Memory allocation in Block structured languages:

A block is a program unit which can contain data declarations. A program in a block structured language is a nested structure of blocks. Block structured language uses dynamic memory allocation.

Scope Rule:

A variable declared in a block is accessible in that block and all inner blocks provided same variable is not re-declared again in inner block.

Scope of a variable:

If a variable vari is crated with name namei in block B a. vari can be accessed in any statement in block B b. vari can be accessed in any statement situated in block B’ which is enclosed in B, unless B’ contains a declaration using the same name.

Memory allocation and access

Automatic dynamic memory allocation is implemented using extended stack model. Each record in stack has two reserved pointer.

Each stack record accommodates the variables for one activation of a block, hence we call it an activation record (AR).

///////////////\\\\\\\\\\\\\\\

X

0 (ARB)| Reserved Pointers ARB1 (ARB)|

TOS

ARI == Activation Record for the ith activation of block ADynamic Pointer: First Reserved Pointer in block’s AR points to activation record of its parent. This is called the dynamic pointer and has address 0 (ARB).

Dynamic pointer is used for de-allocating an AR.

{ int x; { char y; { int z, w; z = 5; x = z; } }}

Sample Program

///////////////\\\\\\\\\\\\\\\

X ARa

///////////////\\\\\\\\\\\\\\\

y ARb

///////////////\\\\\\\\\\\\\\\

zw

ARB

ARc

TOS

DYNAMIC ALLOCATION AND ACCESS

Accessing Non local Variables:A non local variable nl_var of a block b_use is a local variable of some block b_defn enclosing b_use.

According to rules of a block structured language, when b_use is in execution, b_defn must be active. Hence ARb_defn exists in stack and nl_var is to be accessed as

start of address of ARb_defn + dnl_var

Where dnl_var is displacement of nl_var in AR.

Static Pointer:

Access to non-local variables is implemented using second reserved pointer in Activation Record (AR). This pointer which has address 1 (ARB) is called static pointer.

Displays:

For large level of difference, it is expensive to access non-local variables using static pointers. Display is an array used to improve efficiency of non local access.

Symbol Table Requirements:

Symbol table Is required in dynamic memory allocation and access.

Recursion

Recursive procedure are characterized by fact that many invocation of a procedure co-exist during execution of program.

A copy of local variable must be allocated for each execution of program.

This can be done using extended stack model of memory allocation.

Stack Based Memory Allocations Limitations/Difference:

1. Stack based allocation are not adequate for programmed memory allocation using malloc and free. Compiler must use heap for such allocations.

2. Access to variables are implemented through pointers associated with individual variables instead of AR.

3. Garbage collection or Free List are used to implement reuse of de-allocated memory.

4. Stack is also inadequate for multi-activity or multi-threaded programs.

COMPILATION OF EXPRESSIONS:

a * b + c – d / e

Issues in code generation for expressions are:

1. Determination of evaluation Order of operators in an expression2. Selection of instruction to be used in the target code.

3. Use of registers and handling partial results.

Evaluation order depends on operator precedence for e.g. an operator which has higher precedence level compared to its neighbors must be evaluated first.

Choice of instruction in target code depends upon:1. the type and length of operands2. The addressability of each operand. i.e. where operand is located and how

it can be accessed.

Operand Descriptor is used to maintain type, length and addressability of each operand.

Partial result is the value of some sub-expression computed while evaluating an expression.

For efficiency partial results are maintained in CPU registers but some results have to be moved to memory when none of CPU registers are free.

Register Descriptor is used to maintain details of which partial result is stored in register.

Operand Descriptor: It contains following fields:

1. Attributes: Contain the type, length and misc. informations.2. Addressability: Specifies where operand is located and how it can be accessed. a. Addressability code: Value ‘M’ for in Memory operand and ‘R’ – for in Register Operand b. Address:

Address of CPU register or memory word.

Code generation for Expression a * b

MOVER AREG, AMULT AREG, B

Operand Descriptor:

1 (int ,1) M, addr (a) descriptor of a2 (int ,2) M, addr (b) descriptor of b3 (int ,3) R, addr (AREG) descriptor of a * b

Register Descriptor:

A register descriptor has two fields:1. Status : Contains the code “free” or “occupied” to indicate register status.2. Operand descriptor no#: If status = “occupied”, this field contains the descriptor# for the operand contained in the register.

e.g. Register descriptor for AREG after generating code a * b.

Occupied #3

Generation of Instruction:

When operator op is reduced by parser, the function codegen is called with operator op and its descriptors of operands as parameters.

Single instruction can be generated to evaluate operator if one of the oprand is in register and another is in memory.

If both operands are in memory, an instruction is generated to move one of them into register and then operator op is evaluated.

Saving Partial Results:

If all registers are occupied , when operator op is to be evaluated, a register r is freed by copying its value to a temporary to hold its value.

codegen (operator , op1, op2) /* codegen(‘*’ , a, b){

if op1.Addressibilty_code = ‘R’ /*case 1 */ if operator = ‘+’

Generate ‘ADD AREG, OP2’ /* same for other operators */

else if op2.Addressability_code =’R’ /*case 2*/ if operator = ‘+’ Generate ‘ADD AREG, OP1’ /*same for other operators */ else /*case 3*/ if Register_descritor.status = ‘Occupied’ /* Save Partial Result */ Generate ‘MOVEM AREG, Temp[j]’ j = j + 1 Operand_Descriptor[Register_descr.Descritpor#] = (<type>,Temp[j]) Generate ‘MOVER AREG, Op1’ if Operator = ‘+’ Generate ‘ADD AREG, Op2’ /*same for other operators*/ /* Create a NEW Descriptor –Operand value is in register AREG */ I = I + 1 Operand_Descriptor[i] = (<type>, (‘R’, Areg(AREG))) Register_Descriptor = (Occupied, I ) Return I }

Consider an expression:

a * b + c *d * ( e + f) + c *d

+ / \ + * / \ / \ * * c d / \ / \ a b * + / \ / \ c d e f

Intermediate code for Expression

Postfix String Conversion:

consider expression: a + b * c + d * e ^ f

Postfix Expression: a b c * + d e f ^ * +

Postfix string is very popular in non-optimizing compilers due to ease of generation in use. Code generation from postfix string can be performed using a stack of operand descriptors.

Operand descriptors are pushed on stack as operands appear in string. When an operator with same arity k appears in string, k descriptors are popped off the stack. Descriptor for partial result is then pushed again on the stack.

So in example stack contains descriptor a and partial result b * c when ‘+’ is encountered.

Triples: A triple is representation of an elementary operation in the form of pseudo-machine instruction.

Operator Operand1 Operand2

Each operand of a triple is either a variable or constant or the result of some evaluation represented by other TRIPLE.

Expression: a b c * + d e f ^ * +

1 * b c2 + 1 a3 ^ e f4 * d 35 + 2 4

Expression Trees

Postfix Evaluation order may not lead to most efficient code for an expression. So compiler must analyze an expression to find out best evaluation order for its operators.

Expression tree is a syntax tree which depicts the structure of an expression.

COMPILATION OF CONTROL STRUCTURES

Control structure of a programming language is the collection of language features which governs the sequencing of control through a program.

Control Transfer, Conditional Execution and iterative constructs:

Control transfer implemented through conditional and unconditional goto. When target language of a compiler is machine language, compilation of control transfer is similar to assembly of forward or backward goto in assembly program.

Control transfer like IF, FOR or WHILE cause significant semantic gap between the PL domain and execution domain because control transfer are implicit rather than explicit.

This gap is bridged in two steps: 1. Control structure is mapped into an equivalent program containing explicit goto.2. The program is translated into assembly language program.

if e1 then S1;else S2;S3;

if NOT e1 then goto int1;S1;goto int2;int1 : S2;int2 : S3;

while (e2) S1; S2; … Sn; End While;….other statements

int3: if NOT e2 then goto int4; S1; S2; … Sn;goto int3;int4: … other statements

if NOT e1 then goto int1;S1;goto int2;int1 : S2;int2 : S3;

{ ..instruction for NOT e1} BC……, int1 {instruction for S1 } BC ANY , int2;INT1 : { …instruction for S2…}INT2 : {…instruction for S3…}

PROCEDURE or FUNCTION Calls

A function call example, x = function1(y, z) + b * c;

Statement executes body of function1 and returns its value to the calling program. In addition the function call may also result in side effects.

Side Effects: A side effect of a function or procedure call is a change in the value of a variable which is not local to the called function.

Procedure call only achieves side effects; it does not return a value.

While implementing a functional call, the compiler must ensure:

1. Actual parameters are accessible in called function.2. The called function is able to produce side effects according the rule of PL.3. Control is transferred to, and is returned from the called function.4. The function value is returned to the calling program.5. All other aspects of execution of calling programs are unaffected by function call.

Compiler uses a set features to implement function calls:

1. Parameter list: Contains a descriptor for each actual parameter of function call.2. Save Area: Called function saves all CPU registers in this area before starting execution.3. Calling conventions: shared by calling and caller functions a. How parameter list is accessed b. How the save area is accssed c. How transfer of control is made and return is implemented d. How the machine value is returned to calling program

Parameter passing mechanism:

1. Call by value:

Actual parameters are passed to function. These values are assigned to the corresponding formal parameters.

Pass by value can take place in only one direction. I.e. function can not return value via these parameters.

Function can not produce side effects of its parameters.

Call by value is generally used in built-in functions of the language. Its main advantage is simplicity. Function may allocate memory to formal parameters and copy value of actual parameter into this location at every call.

2. Call by value-result:

This mechanism extends the call by value by copying values of formal parameters back into corresponding actual parameters at return.

Side effects are realized at return. This mechanism incurs higher overheads.

3. Call by Reference:

In this mechanism, the address of actual parameter is passed to the called function. If parameter is an expression, its value is computed and stored in temp location and its address is passed to called function.

If parameter is an array element, its address is similarly computed at the time of call

Change in value of parameters are reflected back to original called function.

4. Call by name:

This parameter transmission mechanism has same effect of its every occurrence of a formal parameter in the body of the called function is replaced by name of corresponding actual parameter.

Actual parameter corresponding to a formal parameter can change dynamically during execution of a function. This makes the call by name mechanism powerful.

C – Call by value, referencePascal – Call by value, referencePL/ I – Call by reference onlyAda – Choice of call by value result or referenceALGOL -60 – Call by value and call by name

Interpreters

Notations:tc = Average compilation time per statementte = Average execution time per statementti = Average interpretation time per statementCompilers and interpreters both analyze the source statement to determine its meaning.

During compilation analysis of statement is followed by code generation, while during interpretation it is followed by implementing its meaning.

Assuming tc == ti.Assuming tc = 20 x te. – Execution is 20x faster as compared to compilation

Consider program P. size(P) - represents the number of statemtns in Pstmts_executed(p) – represents no of statements executed

CPU time required for Compilation and Interpretation:

size(p) = 200. P consist of: 20 statements + 10 iteration (8 stmts) + 20 statements for printing result = 120 statements

Compilation model Interpretation Model=200. tc + 120.te=200 .tc + 6.tc=206.tc

= 120.ti= 120.tc

User of Interpreters:

1. Efficiency in certain environments and simplicity.2. It is better to use interpreter for a program P where statements_exeucted (p) < size (p).3. It is simpler to develop interpreters as compared to compilers as interpreters do not involve code generations.

Components of Interpreters

1 Symbol Table: It holds information concerning entities in the source program.

2 Data Store: The data store contains values of the data items declared in the program being interpreted.

The data store consists of set of components; each component is an array containing elements of distinct type.

3. Data manipulation routines: A set of data manipulation routines contains every legal data manipulation action in the source language.

----- Process ----On analyzing source statements, if statement is declaration statement, interpreter locates a component in Data Store and value is stored in Data Store.

Memory mapping of declaration is stored in Symbol Table.

If statement is a = b + c, where a , b and c are of same type of variable…then interpreters executes following routines:

add (b, c, result); assign( a, result);

Advantages:

1. Meaning of source statement is implemented through execution of routine rather than code generation. This simplifies interpretation process.

2. Avoiding generation of machine language instructions makes interpreter portable.

Design and Operation of Interpreter

Program Interpreter (source , output) type symentry = record Record: array[1..10] of character;

Type: character;Address: integer;

endtype;

var symtab : array [1..100] of symentry; realvar : array [1…100] of real; ivar : array[1…100] of integer; r_tos : 1…100; i_tos : 1.100; Procedure assign (address1 : interger, value : integer) begin

ivar[addr1] = value End

procedure add (sym1, sym2 : symentry) begin …. If (sym1.type = ‘real’ && sym2.type =’int’) then …… addrealint(sym1.addrss, sym2.address) ….. end

Procedure addrealint( addr1, addr2 : integer) begin rvar[r_tos] = rvar[addr1] + ivar[addr2] end

begin { MAIN PROGRAM} r_tos = 100; i_tos = 100; ……. analyze source statement and call appropriate route end

E.g. real a, b integer c c = 7 b = 1.2 a = b + c Symb type addressa real 8b real 13c int 5 Rvar Ivar

i_tosr_tos

PURE & Impure Interpreters

Pure InterpretersSource program is retained in the form all through its interpretation. This incurs subtatitial analysis overheads while interpreting statements.

Impure interpreters:It performs some preliminary processing of the source program to reduce analysis overhead during interpretation.

e.g. Preprocessor converts program to some IR (intermediate representation) during interpretation.

c a

b

Linkers

Execution of a program written in a language L involves following steps:1. Translation / Compilation of the program2. Linking of the program with other program / library files3. Relocation of program to execute4. Loading of program in the memory for purpose of execution.

Source Prog Translator Linker Loader ------ Binary Program \ / \ / Object Modules Binary Programs

Relocation is done by linker or loader due to one of the two reasons:1. Same translated address may have been used in object modules of library

program that needs to be linked to our program.2. Operating system may requires that a program should execute from a specific

area of memory.

Address of Program Entities:

1. Translation Time Address: Address assigned by translator / compilers

2. Linked Address: Address assigned by linker

3. Load time address: Address assigned by the loader.

Origins of program:

1. Translated Origin: Address of origin / Start address assigned to program by translator.

2. Linked Origin: Address of the origin assigned by the linker while producig a binary program.

3. Load Origin: Address of origin/ starting address assigned by the loader while loading program for execution.

Relocation and Linking:

Address Sensitive Program: it contains instructions or data addresses of some absolute address of memory.

Program Relocation: is the process of modifying the address used in the address sensitive instructions of a program such that the program executes correctly from the designated area of memory.

If link origin <> translated origin then .. relation must be performed by linkerIf load origin <> link orign then …relocation must be performed by loader

Absolute Loader: If load origin = link orign …Such a loader is called Absolute Loader. That is, relocation is not performed by loader.

Relocating Loader: If loader performs relocation.

Relocation Factor

t_originP = Translated origin l_originP = Linked Originsymb = symbol in program P

Relocation_FactorP = l_originp - t_originP

tsymb = t_progmP + dsymb (displacement from origin)and lsymb = l_progmP + dsymb (displacement from origin) = t_originP + relocation_factor + dsymb = tsymb + relocation_factor

Linking

Is a process of binding an external reference to the correct link time address.

Binary Program

A binary program is a machine language progam consist of a set of program unit SP such that for all Pi e SP.

1. Pi is relocated to the memory area starting from link origin.2. Linking has been performed for external reference.

Object Module:

Object module of a program containts all information necessary to relocate and link the program with other programs.

The Object moduel of a program P consist of:

1. Header: The header contains translated orign, size and execution start address of P.

2. Program: Contains the machine language instructions of P

3. Relocation Table: (RELOCTAB) it Describes relocation. Each entry in relocation table contains an Entry:Translated Address: of address sensitive instructions

4. Linking Table LINKTAB. It contains information about public definition and external reference of P.LINKTAB entry contains:Symbol : Symbolic NameType : Public / Ext [PD/EXT]Tranl. Addres: PublicD: address of first word of symbol Ext: address of memory word require to contain address of symbol

Example:

PROGRAM P

Translated Origin of Program = 500Translated Address of symbol A = 540.

If linked origin of Program = 900Linked Address of Symbol A = 940

PROGRAM Q

Statement START 200 ENTRY ALPHA ….ALPHPA DS 25TOT DS 1 END

Address

231 ) 00 0 025

If program P is linked with Porgram Q. ALPHA is external in P which is defined in Q.

Linked Origin of P = 900Length of P = 41

Linked Origin of Q = 942Linked address of ALPHA = 973

Linker will put address 973 in P wherever ALPHA is used in program P.

Statement START 500 ENTRY TOTAL EXTERN MAX, ALPHA READ ALOOP …. MOVER AREG, ALPHA BC ANY, MAX …. BC LT, LOOP STOPA DS 1TOT DS 1 END

Address

500501

518519

538539540541

Code

09 0 540

04 1 00006 6 000

06 1 50100 0 000

Object program:

1. Translated Origin = 500, size = 42, execution start address = 5002. Machines instructions3. Relocation Table (Address sensitive instructions) 500 538 4. Linking Table ALPHA EXT 518 MAX EXT 519 A PD 540

Design of Linker

Relocataion requirements of program are influenced by addressing structure of the computer system on which it is to execute.

Relocation Algorithm:

1 . Program_linked_origin = <link orign> from linker command2. For each object module, { a. t_origin = translated origin of the object module OM_size = size of Object module b. relocation_factor = l_origin - t_origin c. Read machine language program in Work_area d Read RELOCTAB of object module e.For each Entry in RELOCTAB { translated_addr = address in RELOCTAB addr_in_work_area = address of work_area + translated_address – t_origin add relocation_Factor to operand address in word with address addr_in_work_area; } f. program_linked_origin = program_linked_origin + OM_size }

E.g. lOrigin = 900, tOrigin = 500Addr_of_work_area = 300relocation_factor = 400 first entry in RELOCTAB = 500 addr_in_work_area = 300 + 500 – 500 = 300 …for READ Asecond entry addr_in_work_area = 300 + 538 – 500 = 338

Linking Requirements

In fortran all program units are translated separately. Hence all subprogram calls and common variables reference requires linking.

Pascal procedures are nesed inside main program. Hence procedures do not require linking.

In C, program files are translated separately. Thus only function calls that cross file boundries and global data requires linking.

Linker process all object modules being linked and builds a table of all public definitions and their load time address.

A name table (NTAB) is defined for use in program linking. Each entry contains, Symbol – name of external reference link_addr – for PD it contains address of symbol For object module – it contains link origin of obj mod

Linking : Program P and program Q while linked , liker would make following table,

NTAB :

symbol linked addressP 900A 940Q 942ALPHA 973

Linking Algorithm

1. program_linked_origin – link_origin from LINKER command2. For each object module { a. t_origin = translated origin OM_size = size of obj module b. relocation_factor = program_linked_orign - t_origin c. Read machine language program in Work_area d Read LINKTAB of object module e.For each Entry in LINKTAB with type = PD { Name = symbol linked_addr = translated_addr + relocation_factor entry (name, link_addr) in NTAB } f. Enter (object_mod_name, linked_origin) in NTAB g. Program_linked_orign = program_linked_orign+ OM_szie } 3. For each Object module { a. t_origin = translated origin program_link_origin = load_addr from NTAB b. for each LINKTAB entry with TYPE= EXT { addr_in_work_area= addr_in_work_area + program_link_origin - link_orign + tran_addr - t_orign search symbol in NTAB and copy its linked addrss add linked_address to operand address in with addr_in_work_area } }

SELF Relocating Programs

The manner in which a pgoram can be modified, or can modify itself, to execute from a given load origin can be used to classify programs into the following:

1. Non relocatable programs2. Relocatable programs3. Self-relocating programs

A non-relocatable program is a program which can not be exucted in any memory area other than starting of its translated origin.

Non-relocatable program do not contains information about address sensitive instructions.

E.g Hand coded machine language program

Object modues are relocatable program which can be modify to execute form other area in memory.

Self-relocating program is a program which can perform the relocation of its own address sensitive instructions.

self-relocating program contains two provisions for this purpose:1. A table of information concerning the address sensitive instructions exist as a part of program

2. code to perform relocation of address sensitive instructions also exists as a part of program called relocation logic.

-- self–relocating programs are less-efficient than relocatable programs-- no need of linker for self-relocating programs-- self-relocating program needs to find its load address before it can execute its relocation logic.

MS DOS Linker

LINER uses two pass strategy, in first pass, the object modules are processed to collect information concerning segments and public definitions.

Second pass performs relocation and linking.

Object Module of MS DosIntel 80x86 object module is a sequence of object records. Each of object record describes specific aspect of the program in the object module.

There are 14 types of object records of five categories containing basic information:1. Binary image2. External Reference3. Public definition 4. Debugging Information (e.g. e.g. line number in program)5. Misc. Information (e.g. comments in program)

RECORD DescriptionTHEADR Translator Header RecordLNAMES List of name recordSEGDEF Segment Difinition recordEXTDEF External name definition recordPUBDEF Public name definition recordLEDATA Enumerated dataFIXUPP Fixup recordMODEND Module end record

Pass I

1. Program_linked_origin = <load orign>2. Repeat step for each Object Modules3. Select an object module and process its records a if LNAME record then enter it in NAMELIST b if SEGDEF record b1 I = name index, segment_name = NAMELIST[i] segment_addr = start address in attributes b2 if an absolute segment , enter(seg_name, seg_addr) in NTAB B3 If segment is relocatable and con not be combined with other segments - align address contained in program_link_origin on next word - enter (segment_addr, program_link_orign) in NTAB - program_link_origin = program_link_origin + seg_length

c I PUBDEF record c1 I = base, segment_name = NAMELIST[i], symbol = name c2 segment_addr = load address of segment_name in NTAB c3 symb_addr = segment_addr + offset c4 Enter (symbol, symb_addr) in NTAB

PASS 2 1 . list_of_all_object_modules = object modules name in LINKER command 2. Repeat step 3 until list_of_object_modules is empty 3. Select an object module and process its object records a. If LNAME Enter name in NAMELIST b. If a SEGDEF record I = name index, segment_name = NAMELIST[i] segment_addr = start address in attributes c. If an EXTDEF record 1. external_name = name from EXTDEF record 2. If external_name is not found in NTAB -- locate obj_mod which contains external_name -- add object module to list_of_obj_moduels -- perform first pass of LINKER for new OBJECT modules 3. Enter (ext_name, load addr form NTAB) in EXTTAB.

d. IF an LEDATA record I = segment addr, d = data offset Program_load_orign = SEGTAB[i]. load_addr; Addr_in_work_area = addr_of_work_area + program_load_orign - <load_origin> + d Move data from LEDATA into memory of addr_in_work_area.

E . IF FIXUPP record, for each FIXUPP specification f= offset from locate field Fix_up_addr = addr_in_work_area + f Perform required fix up using load address form SEGTAB or EXTTAB

F. If MODEND record If start address is specifid, computer load address and record It in Executeable file.

LINKING FOR OVERLAYSAn overlays is a part of program (or software package) which has same load origin as some other parts of the program.

Overlay structured program:

Overlay program consist of:1. Permanently resident portion, called root2. A set of overlays

advc3

Documents

machine language program

source language

language processor pass

c program

programming language

equivalent program

program entity

source program sp