to have an understanding of foundations of design of ... - …  · web viewthe term...

100
SP NOTES Synergy Institute of Engineering & Technology,Dhenkanal Biju Patnaik University of Technology, Orissa Nirjharinee Parida Lecture notes On SYSTEM PROGRAMMING Subject code:PCCS4202 Semester:4 th Branch: Computer science & engineering Course title: System Programming Lectures: 32hrs per semester Lecture planned: 47 classes per semester Course aim: Mrs. Nirjharinee Parida, Sr. Lecturer(S.I.E.T,Dhenkanal) Page 1

Upload: hakhanh

Post on 16-Dec-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

SP NOTES

Synergy Institute of Engineering amp TechnologyDhenkanal

Biju Patnaik University of Technology Orissa

Nirjharinee Parida

Lecture notes

On

SYSTEM PROGRAMMING

Subject codePCCS4202

Semester4th

Branch Computer science amp engineering

Course title System Programming

Lectures 32hrs per semester

Lecture planned 47 classes per semester

Course aim

To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler

Course description

System software Machine structureMachine Language assembly language program assembler macro-processor Loader Linker Programming language Formal systems Compiler

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 1

SP NOTES

CLASS-WISE NOTES ON SYSTEM PROGRAMMING

Lecture_no1

PREREQUISITES

One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization

In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software

AIM

To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler

OBJECTIVES

To understand the relationship between system software and machine architecture To know the design and implementation of Assemblers To know the design and implementation of Linkers and Loaders To have an understanding of macroprocessors To know the function of Compiler

EXPECTED OUTCOME

This course provides knowledge to design various system programs

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 2

SP NOTES

MODULE-1

Lecture_no2 Introduction to software its types machine structure

bullDefinition of software its types example its application

bullWhat is Programming

bullWhat is system programming

Ans System programming is a branch of computer science and engineering that deals with the structure of the machine The structure of a machine may compose of memory IO processor CPU (Central Processing Unit) card reader and printer etc In other words system programming is one of high level procedural language knowledge of data structure and computer organization

bullDifferentiate between system software and application software

bull Define computer architecture

bull Define computer organization

bullGeneral hardware organization of a computer system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 3

SP NOTES

Lecture_no3 Evolution of components of system programming

Assembler

An assembler translates assembly language into machine code Assembly language consists of mnemonics for machine opcodes so assemblers perform a 11 translation from mnemonic to a direct instruction For example

LDA 4 converts to 0001001000100100

Conversely one instruction in a high level language will translate to one or more instructions at machine level

Advantages of using an Assembler

Very fast in translating assembly language to machine code as 1 to 1 relationshipAssembly code is often very efficient (and therefore fast) because it is a low level languageAssembly code is fairly easy to understand due to the use of English-like mnemonics

Disadvantages of using Assembler

Assembly language is written for a certain instruction set andor processorAssembly tends to be optimised for the hardware its designed for meaning it is often incompatible with

different hardwareLots of assembly code is needed to do relatively simple tasks and complex programs require lots of

programming time

Compiler

A Compiler is a computer program that translates code written in a high level language to a lower level language objectmachine code The most common reason for translating source code is to create an executable program (converting from a high level language into machine language)

Advantages of using a compiler

Source code is not included therefore compiled code is more secure than interpreted codeTends to produce faster code than interpreting source codeProduces an executable file and therefore the program can be run without need of the source code

Disadvantages of using a compiler

Object code needs to be produced before a final executable file this can be a slow processThe source code must be 100 correct for the executable file to be produced

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 4

SP NOTES

Interpreter

An interpreter program executes other programs directly running through program code and executing it line-by-line As it analyses every line an interpreter is slower than running compiled code but it can take less time to interpret program code than to compile and then run it mdash this is very useful when prototyping and testing code Interpreters are written for multiple platforms this means code written once can be run immediately on different systems without having to recompile for each Examples of this include flash based web programs that will run on your PC MAC games console and Mobile phone

Advantages of using an Interpreter

Easier to debug(check errors) than a compilerEasier to create multi-platform code as each different platform would have an interpreter to run the same

codeUseful for prototyping software and testing basic program logic

Disadvantages of using an Interpreter

Source code is required for the program to be executed and this source code can be read making it insecureInterpreters are generally slower than compiled programs due to the per-line translation method

1)Name the three types of program translator and describe what level of language each translates

Answer

Assembler - Low Level (Assembly) Compiler - High Level Interpreter - High Level

2) Why might you want to use a compiler instead of an interpreter

A compiler makes faster more secure code

3) Why might you want to use an interpreter instead of a compiler

Interpreters allow for code to run on multiple platforms it allows you to debug and test code without having to re-compile the entire source code

Loader

A Loader is system program that place the object program into main memory and prepares it for execution

bull Basic functions of loader

ndash Allocation

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 5

SP NOTES

ndash Linking

ndash Relocation

ndash Loading

Types of Loader-

bull Compile-and-go Loader

bull Relocating Loader

bull Direct Linking Loader

bull Absolute Loader

bull General Loader

bull Dynamic Loader

Macro amp Macro processor

Macro ndash Macro is a single line abbreviation for a group of instruction

MACRO -------- Start of definition

INCR -------- Macro name

A 1DATA

A 2DATA Sequence of instructions to be abbreviated

A 3DATA

MEND -------- End of definition

Linking and Linker

bull Linking ndash The Process of merging many object modules to form a single object program is called as linking

bull Linker bull The Linker is the software program which binds many object modules to make a single object program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 6

SP NOTES

Lecture_no4 Foundation of system programming

1) Formal Systembull A formal system is an un interpreted calculus

It consists of

ndash Alphabets

ndash A set of words called Axioms

ndash Finite set of relations called rules of inference or production rules

ndash Ex Boolean algebra

2) Need Of System Software-The basic need of system software is to achieve the following goals -

bull To achieve efficient performance of the system

bull To make effective execution of general user program

bull To make effective utilization of human resources

bull To make available new better facilities

Need of system programming

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 7

SP NOTES

Lecture_no5 Evolution of operating systemGeneral machine structure

OSJobmultiprogrammingMultitasking

FragmentationPagingScheduling

Facilities of computer os

Operating Systembull It is the collection of system programs which acts as an interface between user and the computer and computer hardware

bull The purpose of an operating system is to provide an environment in which A user can execute programs in a convenient manner

Functions of Operating System

bull File handling and management

bull Storage management (Memory management)

bull Device scheduling and management

bull CPU scheduling

bull Information management

bull Process control (management)

bull Error handling

bull Protecting itself from user amp protecting user from other users

What is MFT

What is MVT

What is Fragmentation

What is paging Explain its various types

What is scheduling

What is meant by the term time sharing

What is virtual memoryWhat are simple batched systems Explain

Give an evolution of an operating system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 8

SP NOTES

Lecture_no6 Description of Von-Neumann machines components

General machine structureVon-Neumannrsquos componentsBlock diagram and functions of each component of Von-neumannrsquos machine

1 The von Neumann Computer Modelbull Von Neumann computer systems contain three main building blocks-gt the central processing unit (CPU)-gt memory-gt and inputoutput devices (IO)bull These three components are connected together using the system busbull The most prominent items within the CPU are the registers they can be manipulated directly by acomputer program The following block diagram shows major relationship between CPU components

(Design of the VonNeumann architecture)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 9

SP NOTES

2 Components of the Von Neumann Model-gt Memory Storage of information (dataprogram)-gt Processing Unit ComputationProcessing of Information-gt Input Means of getting information into the computer eg keyboard mouse-gtOutput Means of getting information out of the computer eg printer monitor bull Control Unit Makes sure that all the other parts perform their tasks correctly and at the correct time The von Neumann Machine

(connection between CPU amp Main memory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 10

SP NOTES

MBR (Memory Buffer Register) = MBR contains a word to be stored in memory or is sued to receive from memory

MAR (Memory Address Register) = MAR specifies the address in memory of the word to be written from or read into the MBR

IR (Instruction Register) = IR contains the 8-bit op-code instruction being executedIBR (Instruction Buffer Register) = IBR is employed to hold temporarily the right-hand instructions

from a word in memoryPC (Program Counter) = PC contains the address of the next instruction-pair to be fetched from

memoryAC and MQ (Accumulator and Multiplier Quotient) = AC and MQ are employed to hold temporarily

operands and results of ALU operations

Fetch The process of retrieving a value from a memory location in order to put it in a register or pass it to a processing unit

IO controller A special-purpose processor that mediates between the main computer processor and a specific input or output device The processor records the request made by the main processor leaving it free to continue other tasks and interrupts the main processor when the input or output task is complete

Input Output The subsystem responsible for managing communications with external devices whether external memory storage other processors or devices for human interaction

Instruction set The set of codes that describe all the legal operations a particular computer can execute

Op code An unsigned binary integer that is assigned to a specific task the hardware can perform

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 11

SP NOTES

Register A special kind of very fast memory which is referred to by name rather than by a memory address number Registers often hold data values that are currently in use by the ALU or other subsystems of the architecture

Random access memory (RAM) The typical computer memory where each cell is addressed individually and may be updated or accessed in time that is independent of the cellrsquos location

Read-only memory (ROM) A memory that is usually very similar to RAM except it may only be accessed not updated

Store The process of writing a value to a memory cell

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 12

SP NOTES

Lecture_no7 Instruction format with microflowchart for ADD instruction

Flowchart for instruction fetchamp execution of general machine structure

Typical operating system Information flow in microflowchart for ADDSUBMULT

Memory operations

FETCH (addr)

1Put addr into MAR2Tell memory unit to ldquoloadrdquo3Memory copies data into MDR

STORE(addrnew-value)

1 Put addr into MAR2 Put new-value into MDR3 Tell memory unit to ldquostorerdquo4 Memory stores data from MDR into memory cell

Problem 1)

a) Write an assembly program for a GPR machine having 8 general purpose registers (R1R7) to evaluate the following expression Ignore the remainder part of the division process

( (a + b) (a + d) ) (a e)

where abde are variables with addresses ABDE respectively and each operation is defined as follows + --------gt ADD Ri Rj ( Rj lt- Ri + Rj) --------gt MULT Ri Rj ( Rj lt- Ri Rj) --------gt DIV Ri Rj ( Rj lt- Rj Ri)

a = M[A] ------gt content of memory location at address A same for B D E

move A D0 D0 = amove B D4 D4 = badd D0 D4 D4 = a + bmove D D1 D1 = dadd D0 D1 D1 = a + ddiv D1D4 D4 = (a + b) (a + d)move E D2 D2 = emult D0 D2 D2 = (a e)mult D4 D2 D2 = ( (a + b) (a + d) ) (a e)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 13

SP NOTES

Problem 2)

a) Provide micro-instructions in the form of detailed flowchart for the execution of the following instructions

b) Show the flow of the micro-instructions within the GPR architecture by highlighting the system buses and the registers used

i)move X D0 move contents of memory location X to register D0ii)move D0 X move contents of register D0 to memory location Xiii)movea 22(a3)25(a1) move contents from memory to memoryiv)bgtw label branch if greater than to location ldquolabelrdquo

Answer 2)

For each of the above for part b) draw the CPU consisting of all the registers (for example MAR MBR PC IR Address and Data Registers ALU) and Memory and show each of the micro-instruction as in part a)

i) move X D0Machine Instruction

0011 0000 0011 1000

Address of X

a) Micro-steps are

MAR lt-- PC

MBR lt-- M[MAR] fetch

IR lt-- MBR

| decode |PC lt-- PC + 2 PC points to source addressMAR lt-- PCMBR lt-- M[MAR] address of X in MBRMAR MBR move X to MARMBR M[MAR] read contents of location XD0 MBR PC lt-- PC + 2 PC points to next instruction

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 14

SP NOTES

ii) move D0 X

Machine Instruction

0011 1110 0000 0000

Address of X

a) Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to destination addressMAR lt-- PCMBR M[MAR] read destination address XMAR MBR MAR has X now

MBR D0 source data in MBR

M[MAR] MBR data in Effective address X (which is in MAR)

PC lt-- PC + 2 PC points to next instruction

iii) movea 22(a3)25(a1)

Machine Instruction

0011 0011 0110 1011

0000 0000 0001 0110

0000 0000 0001 1001

a)Source Effective address = contents of a3 + $16temp lt--- M[a3 + $16]

Destination Effective address = contents of a1 + $19M[a1 + $19] lt- temp

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 15

SP NOTES

Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to source displacementMAR lt-- PCMBR lt-- M[MAR] source displacement loadedMAR lt-- [A3 + MBR] source effective address calculatedMBR lt- M[MAR] source data in MBRWR lt- MBR saved temporarily since MBR is to be reusedPC lt-- PC + 2 PC points to destination displacementMAR lt-- PCMBR lt-- M[MAR] destination displacement loaded

MAR lt-- [A1 + MBR] destination effective address calculatedMBR lt- WR source data back to MBR

M[MAR] lt-- MBR source data moved to destination

PC lt-- PC + 2 PC points to next instruction

iv) bgtw label

Machine Instruction

0110 1110 0000 0000 ---- displacement ----

a)MAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | interpret decode |

TRUE FALSE | |

PC lt-- PC + 2 PC lt- PC + 4 MAR lt-- PC MBR lt-- M[MAR] get displacement PC lt-- PC + MBR execute

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 16

SP NOTES

Lecture_no8 Memory unit Register Data of IBM 360

General machine structure of IBM 360370

The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific

(i)Memory unit

Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword

The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory

(ii)Register-

Registers are areas of storage that are set aside in the processing unit

There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)

Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions

(iii)Data

Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits

followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity

Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign

Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17

SP NOTES

three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output

Lecture_no9 Various instruction format of IBM 360370

( iv)Instruction

Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats

RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R

Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field

RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location

Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement

RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces

Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement

SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data

Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement

SS (six bytes) S tells that that there is data in storage Here both operands have data in storage

Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1

Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18

SP NOTES

Addressing Mode

1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D

4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19

SP NOTES

Lecture_no10 Machine language Long way no looping method

Elementary Instruction Set

Typical Data Transfer Instructions

Typical Arithmetic Instructions

Typical Logical and Bit Manipulation Instr

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20

SP NOTES

Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y

MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT

Lecture_no11 Address modification using instruction as data ampusing index register

Lecture_no12 Looping method with example

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21

SP NOTES

Lecture_no13 Assembly language machine language

Assembly LanguageAssembly instructions are just shorthand for machine instructions

Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22

SP NOTES

Very easy to write an Assembly Language -gtMachine language translator

Exercise

What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution

STORE 1 R1MOVE R1 R2LOAD R2 1HALT

Machine Language

The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer

Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer

10000001001001011000000101000101101000010000011010000010000001101111111111111111

LOAD contents of memorylocation into register

100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011

STORE contents of registerinto memory location

100000100 RR MMMMMex Mem[4] = R0100000100 00 00100

MOVE contents of oneregister into another register

100100010000 RR RRex R0 = R1100100010000 00 01

ADD contents of 2 registersstore result in third

1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10

SUBTRACT contents of 2registers store result into third

1010001000 RR RR RRex R0 = R1 ndash R2

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23

SP NOTES

1010001000 00 01 10

Halt the program 1111111111111111

Lecture_no14 Meaning of Assembly language program types Literals

bullUSING

bullBALR

bullSTART

bullEND

bullBR

bullLiteral

bullLTORG

bullBCT

bullEQU ExampleSTART EQU 10

The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write

START EQU 10ORG STARTGOTO 10

bullORG

The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored

Example

ORG 0GOTO 10

The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000

But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24

SP NOTES

bullpseudo-ops vs machine-ops

Lecture_no15 Example on Assembly language program

Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc

LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25

SP NOTES

14 NUMBER BLKW 115 SIX FILL x000616 17 END

Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode

The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7

The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary

LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10

BRp AGAIN

If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label

CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader

Pseudo-ops (Assembler Directives)

Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26

SP NOTES

ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory

Lecture_no16 surprise test

Lecture_no17 MODULE-2

AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source

Cross AssemblerThe assembler may run on one machine and produce object code for another

An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader

(Fig Assembler)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27

SP NOTES

For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program

(Fig Program Translation Steps)

Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer

Basic Assembler Functions

Convert mnemonic operation codes to machine language equivalents

Convert symbolic operands to machine addresses (pass 1)

Build machine instructions in the proper format

Convert data constants to internal (machine)representations

Error checking is provided

Write the object program and assembly listing files

Assembler directives

-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions

-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-

bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in

the program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28

SP NOTES

BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant

WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area

Lecture_no18 General design procedure of 360-type Assembler

Syntax Assembly language statements

Assembly language statements are entered one statement per line Each statement follows the following format

[label] mnemonic [operands] [comment]

The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command

An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)

Ex-

START LDAA 24 H

Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29

SP NOTES

on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character

Types of Assembly language statements

It consists of 3 typeseg

(i)Imperative statement

(ii)Declarative statements

(iii)Assembler directive statements

Advantages amp disadvantages of Assembly language

Lecture_no19 Design Specification of 1-pass assembler with example

Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task

Assembler Design can be done in

ndashSingle pass

ndashTwo pass

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30

Source program

Pass1

OPTAB SYMTAB

Intermediate file

Pass2

SYMTAB

Object codes

OPTAB

SP NOTES

The Intermediate file include each source statement assigned address and error indicator

Pass I

Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location

counter Build the symbol table (Table that is used to store each label and its corresponding value)

Pass II

Generate object code(perform actual translation)

One-Pass Assembler

The translation performed by an assembler is essentially a collection of substitutions machine operation code

for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines

bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31

LOCCTR

SP NOTES

(detailed pass1 assembler flowchart)

Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32

SP NOTES

Lecture_no20 Multi-pass assembler with example

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

(Detailed pass2 assembler flowchart)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

CLASS-WISE NOTES ON SYSTEM PROGRAMMING

Lecture_no1

PREREQUISITES

One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization

In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software

AIM

To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler

OBJECTIVES

To understand the relationship between system software and machine architecture To know the design and implementation of Assemblers To know the design and implementation of Linkers and Loaders To have an understanding of macroprocessors To know the function of Compiler

EXPECTED OUTCOME

This course provides knowledge to design various system programs

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 2

SP NOTES

MODULE-1

Lecture_no2 Introduction to software its types machine structure

bullDefinition of software its types example its application

bullWhat is Programming

bullWhat is system programming

Ans System programming is a branch of computer science and engineering that deals with the structure of the machine The structure of a machine may compose of memory IO processor CPU (Central Processing Unit) card reader and printer etc In other words system programming is one of high level procedural language knowledge of data structure and computer organization

bullDifferentiate between system software and application software

bull Define computer architecture

bull Define computer organization

bullGeneral hardware organization of a computer system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 3

SP NOTES

Lecture_no3 Evolution of components of system programming

Assembler

An assembler translates assembly language into machine code Assembly language consists of mnemonics for machine opcodes so assemblers perform a 11 translation from mnemonic to a direct instruction For example

LDA 4 converts to 0001001000100100

Conversely one instruction in a high level language will translate to one or more instructions at machine level

Advantages of using an Assembler

Very fast in translating assembly language to machine code as 1 to 1 relationshipAssembly code is often very efficient (and therefore fast) because it is a low level languageAssembly code is fairly easy to understand due to the use of English-like mnemonics

Disadvantages of using Assembler

Assembly language is written for a certain instruction set andor processorAssembly tends to be optimised for the hardware its designed for meaning it is often incompatible with

different hardwareLots of assembly code is needed to do relatively simple tasks and complex programs require lots of

programming time

Compiler

A Compiler is a computer program that translates code written in a high level language to a lower level language objectmachine code The most common reason for translating source code is to create an executable program (converting from a high level language into machine language)

Advantages of using a compiler

Source code is not included therefore compiled code is more secure than interpreted codeTends to produce faster code than interpreting source codeProduces an executable file and therefore the program can be run without need of the source code

Disadvantages of using a compiler

Object code needs to be produced before a final executable file this can be a slow processThe source code must be 100 correct for the executable file to be produced

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 4

SP NOTES

Interpreter

An interpreter program executes other programs directly running through program code and executing it line-by-line As it analyses every line an interpreter is slower than running compiled code but it can take less time to interpret program code than to compile and then run it mdash this is very useful when prototyping and testing code Interpreters are written for multiple platforms this means code written once can be run immediately on different systems without having to recompile for each Examples of this include flash based web programs that will run on your PC MAC games console and Mobile phone

Advantages of using an Interpreter

Easier to debug(check errors) than a compilerEasier to create multi-platform code as each different platform would have an interpreter to run the same

codeUseful for prototyping software and testing basic program logic

Disadvantages of using an Interpreter

Source code is required for the program to be executed and this source code can be read making it insecureInterpreters are generally slower than compiled programs due to the per-line translation method

1)Name the three types of program translator and describe what level of language each translates

Answer

Assembler - Low Level (Assembly) Compiler - High Level Interpreter - High Level

2) Why might you want to use a compiler instead of an interpreter

A compiler makes faster more secure code

3) Why might you want to use an interpreter instead of a compiler

Interpreters allow for code to run on multiple platforms it allows you to debug and test code without having to re-compile the entire source code

Loader

A Loader is system program that place the object program into main memory and prepares it for execution

bull Basic functions of loader

ndash Allocation

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 5

SP NOTES

ndash Linking

ndash Relocation

ndash Loading

Types of Loader-

bull Compile-and-go Loader

bull Relocating Loader

bull Direct Linking Loader

bull Absolute Loader

bull General Loader

bull Dynamic Loader

Macro amp Macro processor

Macro ndash Macro is a single line abbreviation for a group of instruction

MACRO -------- Start of definition

INCR -------- Macro name

A 1DATA

A 2DATA Sequence of instructions to be abbreviated

A 3DATA

MEND -------- End of definition

Linking and Linker

bull Linking ndash The Process of merging many object modules to form a single object program is called as linking

bull Linker bull The Linker is the software program which binds many object modules to make a single object program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 6

SP NOTES

Lecture_no4 Foundation of system programming

1) Formal Systembull A formal system is an un interpreted calculus

It consists of

ndash Alphabets

ndash A set of words called Axioms

ndash Finite set of relations called rules of inference or production rules

ndash Ex Boolean algebra

2) Need Of System Software-The basic need of system software is to achieve the following goals -

bull To achieve efficient performance of the system

bull To make effective execution of general user program

bull To make effective utilization of human resources

bull To make available new better facilities

Need of system programming

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 7

SP NOTES

Lecture_no5 Evolution of operating systemGeneral machine structure

OSJobmultiprogrammingMultitasking

FragmentationPagingScheduling

Facilities of computer os

Operating Systembull It is the collection of system programs which acts as an interface between user and the computer and computer hardware

bull The purpose of an operating system is to provide an environment in which A user can execute programs in a convenient manner

Functions of Operating System

bull File handling and management

bull Storage management (Memory management)

bull Device scheduling and management

bull CPU scheduling

bull Information management

bull Process control (management)

bull Error handling

bull Protecting itself from user amp protecting user from other users

What is MFT

What is MVT

What is Fragmentation

What is paging Explain its various types

What is scheduling

What is meant by the term time sharing

What is virtual memoryWhat are simple batched systems Explain

Give an evolution of an operating system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 8

SP NOTES

Lecture_no6 Description of Von-Neumann machines components

General machine structureVon-Neumannrsquos componentsBlock diagram and functions of each component of Von-neumannrsquos machine

1 The von Neumann Computer Modelbull Von Neumann computer systems contain three main building blocks-gt the central processing unit (CPU)-gt memory-gt and inputoutput devices (IO)bull These three components are connected together using the system busbull The most prominent items within the CPU are the registers they can be manipulated directly by acomputer program The following block diagram shows major relationship between CPU components

(Design of the VonNeumann architecture)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 9

SP NOTES

2 Components of the Von Neumann Model-gt Memory Storage of information (dataprogram)-gt Processing Unit ComputationProcessing of Information-gt Input Means of getting information into the computer eg keyboard mouse-gtOutput Means of getting information out of the computer eg printer monitor bull Control Unit Makes sure that all the other parts perform their tasks correctly and at the correct time The von Neumann Machine

(connection between CPU amp Main memory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 10

SP NOTES

MBR (Memory Buffer Register) = MBR contains a word to be stored in memory or is sued to receive from memory

MAR (Memory Address Register) = MAR specifies the address in memory of the word to be written from or read into the MBR

IR (Instruction Register) = IR contains the 8-bit op-code instruction being executedIBR (Instruction Buffer Register) = IBR is employed to hold temporarily the right-hand instructions

from a word in memoryPC (Program Counter) = PC contains the address of the next instruction-pair to be fetched from

memoryAC and MQ (Accumulator and Multiplier Quotient) = AC and MQ are employed to hold temporarily

operands and results of ALU operations

Fetch The process of retrieving a value from a memory location in order to put it in a register or pass it to a processing unit

IO controller A special-purpose processor that mediates between the main computer processor and a specific input or output device The processor records the request made by the main processor leaving it free to continue other tasks and interrupts the main processor when the input or output task is complete

Input Output The subsystem responsible for managing communications with external devices whether external memory storage other processors or devices for human interaction

Instruction set The set of codes that describe all the legal operations a particular computer can execute

Op code An unsigned binary integer that is assigned to a specific task the hardware can perform

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 11

SP NOTES

Register A special kind of very fast memory which is referred to by name rather than by a memory address number Registers often hold data values that are currently in use by the ALU or other subsystems of the architecture

Random access memory (RAM) The typical computer memory where each cell is addressed individually and may be updated or accessed in time that is independent of the cellrsquos location

Read-only memory (ROM) A memory that is usually very similar to RAM except it may only be accessed not updated

Store The process of writing a value to a memory cell

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 12

SP NOTES

Lecture_no7 Instruction format with microflowchart for ADD instruction

Flowchart for instruction fetchamp execution of general machine structure

Typical operating system Information flow in microflowchart for ADDSUBMULT

Memory operations

FETCH (addr)

1Put addr into MAR2Tell memory unit to ldquoloadrdquo3Memory copies data into MDR

STORE(addrnew-value)

1 Put addr into MAR2 Put new-value into MDR3 Tell memory unit to ldquostorerdquo4 Memory stores data from MDR into memory cell

Problem 1)

a) Write an assembly program for a GPR machine having 8 general purpose registers (R1R7) to evaluate the following expression Ignore the remainder part of the division process

( (a + b) (a + d) ) (a e)

where abde are variables with addresses ABDE respectively and each operation is defined as follows + --------gt ADD Ri Rj ( Rj lt- Ri + Rj) --------gt MULT Ri Rj ( Rj lt- Ri Rj) --------gt DIV Ri Rj ( Rj lt- Rj Ri)

a = M[A] ------gt content of memory location at address A same for B D E

move A D0 D0 = amove B D4 D4 = badd D0 D4 D4 = a + bmove D D1 D1 = dadd D0 D1 D1 = a + ddiv D1D4 D4 = (a + b) (a + d)move E D2 D2 = emult D0 D2 D2 = (a e)mult D4 D2 D2 = ( (a + b) (a + d) ) (a e)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 13

SP NOTES

Problem 2)

a) Provide micro-instructions in the form of detailed flowchart for the execution of the following instructions

b) Show the flow of the micro-instructions within the GPR architecture by highlighting the system buses and the registers used

i)move X D0 move contents of memory location X to register D0ii)move D0 X move contents of register D0 to memory location Xiii)movea 22(a3)25(a1) move contents from memory to memoryiv)bgtw label branch if greater than to location ldquolabelrdquo

Answer 2)

For each of the above for part b) draw the CPU consisting of all the registers (for example MAR MBR PC IR Address and Data Registers ALU) and Memory and show each of the micro-instruction as in part a)

i) move X D0Machine Instruction

0011 0000 0011 1000

Address of X

a) Micro-steps are

MAR lt-- PC

MBR lt-- M[MAR] fetch

IR lt-- MBR

| decode |PC lt-- PC + 2 PC points to source addressMAR lt-- PCMBR lt-- M[MAR] address of X in MBRMAR MBR move X to MARMBR M[MAR] read contents of location XD0 MBR PC lt-- PC + 2 PC points to next instruction

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 14

SP NOTES

ii) move D0 X

Machine Instruction

0011 1110 0000 0000

Address of X

a) Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to destination addressMAR lt-- PCMBR M[MAR] read destination address XMAR MBR MAR has X now

MBR D0 source data in MBR

M[MAR] MBR data in Effective address X (which is in MAR)

PC lt-- PC + 2 PC points to next instruction

iii) movea 22(a3)25(a1)

Machine Instruction

0011 0011 0110 1011

0000 0000 0001 0110

0000 0000 0001 1001

a)Source Effective address = contents of a3 + $16temp lt--- M[a3 + $16]

Destination Effective address = contents of a1 + $19M[a1 + $19] lt- temp

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 15

SP NOTES

Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to source displacementMAR lt-- PCMBR lt-- M[MAR] source displacement loadedMAR lt-- [A3 + MBR] source effective address calculatedMBR lt- M[MAR] source data in MBRWR lt- MBR saved temporarily since MBR is to be reusedPC lt-- PC + 2 PC points to destination displacementMAR lt-- PCMBR lt-- M[MAR] destination displacement loaded

MAR lt-- [A1 + MBR] destination effective address calculatedMBR lt- WR source data back to MBR

M[MAR] lt-- MBR source data moved to destination

PC lt-- PC + 2 PC points to next instruction

iv) bgtw label

Machine Instruction

0110 1110 0000 0000 ---- displacement ----

a)MAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | interpret decode |

TRUE FALSE | |

PC lt-- PC + 2 PC lt- PC + 4 MAR lt-- PC MBR lt-- M[MAR] get displacement PC lt-- PC + MBR execute

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 16

SP NOTES

Lecture_no8 Memory unit Register Data of IBM 360

General machine structure of IBM 360370

The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific

(i)Memory unit

Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword

The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory

(ii)Register-

Registers are areas of storage that are set aside in the processing unit

There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)

Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions

(iii)Data

Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits

followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity

Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign

Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17

SP NOTES

three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output

Lecture_no9 Various instruction format of IBM 360370

( iv)Instruction

Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats

RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R

Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field

RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location

Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement

RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces

Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement

SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data

Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement

SS (six bytes) S tells that that there is data in storage Here both operands have data in storage

Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1

Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18

SP NOTES

Addressing Mode

1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D

4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19

SP NOTES

Lecture_no10 Machine language Long way no looping method

Elementary Instruction Set

Typical Data Transfer Instructions

Typical Arithmetic Instructions

Typical Logical and Bit Manipulation Instr

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20

SP NOTES

Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y

MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT

Lecture_no11 Address modification using instruction as data ampusing index register

Lecture_no12 Looping method with example

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21

SP NOTES

Lecture_no13 Assembly language machine language

Assembly LanguageAssembly instructions are just shorthand for machine instructions

Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22

SP NOTES

Very easy to write an Assembly Language -gtMachine language translator

Exercise

What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution

STORE 1 R1MOVE R1 R2LOAD R2 1HALT

Machine Language

The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer

Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer

10000001001001011000000101000101101000010000011010000010000001101111111111111111

LOAD contents of memorylocation into register

100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011

STORE contents of registerinto memory location

100000100 RR MMMMMex Mem[4] = R0100000100 00 00100

MOVE contents of oneregister into another register

100100010000 RR RRex R0 = R1100100010000 00 01

ADD contents of 2 registersstore result in third

1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10

SUBTRACT contents of 2registers store result into third

1010001000 RR RR RRex R0 = R1 ndash R2

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23

SP NOTES

1010001000 00 01 10

Halt the program 1111111111111111

Lecture_no14 Meaning of Assembly language program types Literals

bullUSING

bullBALR

bullSTART

bullEND

bullBR

bullLiteral

bullLTORG

bullBCT

bullEQU ExampleSTART EQU 10

The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write

START EQU 10ORG STARTGOTO 10

bullORG

The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored

Example

ORG 0GOTO 10

The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000

But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24

SP NOTES

bullpseudo-ops vs machine-ops

Lecture_no15 Example on Assembly language program

Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc

LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25

SP NOTES

14 NUMBER BLKW 115 SIX FILL x000616 17 END

Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode

The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7

The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary

LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10

BRp AGAIN

If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label

CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader

Pseudo-ops (Assembler Directives)

Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26

SP NOTES

ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory

Lecture_no16 surprise test

Lecture_no17 MODULE-2

AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source

Cross AssemblerThe assembler may run on one machine and produce object code for another

An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader

(Fig Assembler)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27

SP NOTES

For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program

(Fig Program Translation Steps)

Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer

Basic Assembler Functions

Convert mnemonic operation codes to machine language equivalents

Convert symbolic operands to machine addresses (pass 1)

Build machine instructions in the proper format

Convert data constants to internal (machine)representations

Error checking is provided

Write the object program and assembly listing files

Assembler directives

-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions

-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-

bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in

the program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28

SP NOTES

BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant

WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area

Lecture_no18 General design procedure of 360-type Assembler

Syntax Assembly language statements

Assembly language statements are entered one statement per line Each statement follows the following format

[label] mnemonic [operands] [comment]

The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command

An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)

Ex-

START LDAA 24 H

Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29

SP NOTES

on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character

Types of Assembly language statements

It consists of 3 typeseg

(i)Imperative statement

(ii)Declarative statements

(iii)Assembler directive statements

Advantages amp disadvantages of Assembly language

Lecture_no19 Design Specification of 1-pass assembler with example

Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task

Assembler Design can be done in

ndashSingle pass

ndashTwo pass

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30

Source program

Pass1

OPTAB SYMTAB

Intermediate file

Pass2

SYMTAB

Object codes

OPTAB

SP NOTES

The Intermediate file include each source statement assigned address and error indicator

Pass I

Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location

counter Build the symbol table (Table that is used to store each label and its corresponding value)

Pass II

Generate object code(perform actual translation)

One-Pass Assembler

The translation performed by an assembler is essentially a collection of substitutions machine operation code

for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines

bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31

LOCCTR

SP NOTES

(detailed pass1 assembler flowchart)

Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32

SP NOTES

Lecture_no20 Multi-pass assembler with example

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

(Detailed pass2 assembler flowchart)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

MODULE-1

Lecture_no2 Introduction to software its types machine structure

bullDefinition of software its types example its application

bullWhat is Programming

bullWhat is system programming

Ans System programming is a branch of computer science and engineering that deals with the structure of the machine The structure of a machine may compose of memory IO processor CPU (Central Processing Unit) card reader and printer etc In other words system programming is one of high level procedural language knowledge of data structure and computer organization

bullDifferentiate between system software and application software

bull Define computer architecture

bull Define computer organization

bullGeneral hardware organization of a computer system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 3

SP NOTES

Lecture_no3 Evolution of components of system programming

Assembler

An assembler translates assembly language into machine code Assembly language consists of mnemonics for machine opcodes so assemblers perform a 11 translation from mnemonic to a direct instruction For example

LDA 4 converts to 0001001000100100

Conversely one instruction in a high level language will translate to one or more instructions at machine level

Advantages of using an Assembler

Very fast in translating assembly language to machine code as 1 to 1 relationshipAssembly code is often very efficient (and therefore fast) because it is a low level languageAssembly code is fairly easy to understand due to the use of English-like mnemonics

Disadvantages of using Assembler

Assembly language is written for a certain instruction set andor processorAssembly tends to be optimised for the hardware its designed for meaning it is often incompatible with

different hardwareLots of assembly code is needed to do relatively simple tasks and complex programs require lots of

programming time

Compiler

A Compiler is a computer program that translates code written in a high level language to a lower level language objectmachine code The most common reason for translating source code is to create an executable program (converting from a high level language into machine language)

Advantages of using a compiler

Source code is not included therefore compiled code is more secure than interpreted codeTends to produce faster code than interpreting source codeProduces an executable file and therefore the program can be run without need of the source code

Disadvantages of using a compiler

Object code needs to be produced before a final executable file this can be a slow processThe source code must be 100 correct for the executable file to be produced

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 4

SP NOTES

Interpreter

An interpreter program executes other programs directly running through program code and executing it line-by-line As it analyses every line an interpreter is slower than running compiled code but it can take less time to interpret program code than to compile and then run it mdash this is very useful when prototyping and testing code Interpreters are written for multiple platforms this means code written once can be run immediately on different systems without having to recompile for each Examples of this include flash based web programs that will run on your PC MAC games console and Mobile phone

Advantages of using an Interpreter

Easier to debug(check errors) than a compilerEasier to create multi-platform code as each different platform would have an interpreter to run the same

codeUseful for prototyping software and testing basic program logic

Disadvantages of using an Interpreter

Source code is required for the program to be executed and this source code can be read making it insecureInterpreters are generally slower than compiled programs due to the per-line translation method

1)Name the three types of program translator and describe what level of language each translates

Answer

Assembler - Low Level (Assembly) Compiler - High Level Interpreter - High Level

2) Why might you want to use a compiler instead of an interpreter

A compiler makes faster more secure code

3) Why might you want to use an interpreter instead of a compiler

Interpreters allow for code to run on multiple platforms it allows you to debug and test code without having to re-compile the entire source code

Loader

A Loader is system program that place the object program into main memory and prepares it for execution

bull Basic functions of loader

ndash Allocation

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 5

SP NOTES

ndash Linking

ndash Relocation

ndash Loading

Types of Loader-

bull Compile-and-go Loader

bull Relocating Loader

bull Direct Linking Loader

bull Absolute Loader

bull General Loader

bull Dynamic Loader

Macro amp Macro processor

Macro ndash Macro is a single line abbreviation for a group of instruction

MACRO -------- Start of definition

INCR -------- Macro name

A 1DATA

A 2DATA Sequence of instructions to be abbreviated

A 3DATA

MEND -------- End of definition

Linking and Linker

bull Linking ndash The Process of merging many object modules to form a single object program is called as linking

bull Linker bull The Linker is the software program which binds many object modules to make a single object program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 6

SP NOTES

Lecture_no4 Foundation of system programming

1) Formal Systembull A formal system is an un interpreted calculus

It consists of

ndash Alphabets

ndash A set of words called Axioms

ndash Finite set of relations called rules of inference or production rules

ndash Ex Boolean algebra

2) Need Of System Software-The basic need of system software is to achieve the following goals -

bull To achieve efficient performance of the system

bull To make effective execution of general user program

bull To make effective utilization of human resources

bull To make available new better facilities

Need of system programming

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 7

SP NOTES

Lecture_no5 Evolution of operating systemGeneral machine structure

OSJobmultiprogrammingMultitasking

FragmentationPagingScheduling

Facilities of computer os

Operating Systembull It is the collection of system programs which acts as an interface between user and the computer and computer hardware

bull The purpose of an operating system is to provide an environment in which A user can execute programs in a convenient manner

Functions of Operating System

bull File handling and management

bull Storage management (Memory management)

bull Device scheduling and management

bull CPU scheduling

bull Information management

bull Process control (management)

bull Error handling

bull Protecting itself from user amp protecting user from other users

What is MFT

What is MVT

What is Fragmentation

What is paging Explain its various types

What is scheduling

What is meant by the term time sharing

What is virtual memoryWhat are simple batched systems Explain

Give an evolution of an operating system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 8

SP NOTES

Lecture_no6 Description of Von-Neumann machines components

General machine structureVon-Neumannrsquos componentsBlock diagram and functions of each component of Von-neumannrsquos machine

1 The von Neumann Computer Modelbull Von Neumann computer systems contain three main building blocks-gt the central processing unit (CPU)-gt memory-gt and inputoutput devices (IO)bull These three components are connected together using the system busbull The most prominent items within the CPU are the registers they can be manipulated directly by acomputer program The following block diagram shows major relationship between CPU components

(Design of the VonNeumann architecture)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 9

SP NOTES

2 Components of the Von Neumann Model-gt Memory Storage of information (dataprogram)-gt Processing Unit ComputationProcessing of Information-gt Input Means of getting information into the computer eg keyboard mouse-gtOutput Means of getting information out of the computer eg printer monitor bull Control Unit Makes sure that all the other parts perform their tasks correctly and at the correct time The von Neumann Machine

(connection between CPU amp Main memory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 10

SP NOTES

MBR (Memory Buffer Register) = MBR contains a word to be stored in memory or is sued to receive from memory

MAR (Memory Address Register) = MAR specifies the address in memory of the word to be written from or read into the MBR

IR (Instruction Register) = IR contains the 8-bit op-code instruction being executedIBR (Instruction Buffer Register) = IBR is employed to hold temporarily the right-hand instructions

from a word in memoryPC (Program Counter) = PC contains the address of the next instruction-pair to be fetched from

memoryAC and MQ (Accumulator and Multiplier Quotient) = AC and MQ are employed to hold temporarily

operands and results of ALU operations

Fetch The process of retrieving a value from a memory location in order to put it in a register or pass it to a processing unit

IO controller A special-purpose processor that mediates between the main computer processor and a specific input or output device The processor records the request made by the main processor leaving it free to continue other tasks and interrupts the main processor when the input or output task is complete

Input Output The subsystem responsible for managing communications with external devices whether external memory storage other processors or devices for human interaction

Instruction set The set of codes that describe all the legal operations a particular computer can execute

Op code An unsigned binary integer that is assigned to a specific task the hardware can perform

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 11

SP NOTES

Register A special kind of very fast memory which is referred to by name rather than by a memory address number Registers often hold data values that are currently in use by the ALU or other subsystems of the architecture

Random access memory (RAM) The typical computer memory where each cell is addressed individually and may be updated or accessed in time that is independent of the cellrsquos location

Read-only memory (ROM) A memory that is usually very similar to RAM except it may only be accessed not updated

Store The process of writing a value to a memory cell

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 12

SP NOTES

Lecture_no7 Instruction format with microflowchart for ADD instruction

Flowchart for instruction fetchamp execution of general machine structure

Typical operating system Information flow in microflowchart for ADDSUBMULT

Memory operations

FETCH (addr)

1Put addr into MAR2Tell memory unit to ldquoloadrdquo3Memory copies data into MDR

STORE(addrnew-value)

1 Put addr into MAR2 Put new-value into MDR3 Tell memory unit to ldquostorerdquo4 Memory stores data from MDR into memory cell

Problem 1)

a) Write an assembly program for a GPR machine having 8 general purpose registers (R1R7) to evaluate the following expression Ignore the remainder part of the division process

( (a + b) (a + d) ) (a e)

where abde are variables with addresses ABDE respectively and each operation is defined as follows + --------gt ADD Ri Rj ( Rj lt- Ri + Rj) --------gt MULT Ri Rj ( Rj lt- Ri Rj) --------gt DIV Ri Rj ( Rj lt- Rj Ri)

a = M[A] ------gt content of memory location at address A same for B D E

move A D0 D0 = amove B D4 D4 = badd D0 D4 D4 = a + bmove D D1 D1 = dadd D0 D1 D1 = a + ddiv D1D4 D4 = (a + b) (a + d)move E D2 D2 = emult D0 D2 D2 = (a e)mult D4 D2 D2 = ( (a + b) (a + d) ) (a e)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 13

SP NOTES

Problem 2)

a) Provide micro-instructions in the form of detailed flowchart for the execution of the following instructions

b) Show the flow of the micro-instructions within the GPR architecture by highlighting the system buses and the registers used

i)move X D0 move contents of memory location X to register D0ii)move D0 X move contents of register D0 to memory location Xiii)movea 22(a3)25(a1) move contents from memory to memoryiv)bgtw label branch if greater than to location ldquolabelrdquo

Answer 2)

For each of the above for part b) draw the CPU consisting of all the registers (for example MAR MBR PC IR Address and Data Registers ALU) and Memory and show each of the micro-instruction as in part a)

i) move X D0Machine Instruction

0011 0000 0011 1000

Address of X

a) Micro-steps are

MAR lt-- PC

MBR lt-- M[MAR] fetch

IR lt-- MBR

| decode |PC lt-- PC + 2 PC points to source addressMAR lt-- PCMBR lt-- M[MAR] address of X in MBRMAR MBR move X to MARMBR M[MAR] read contents of location XD0 MBR PC lt-- PC + 2 PC points to next instruction

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 14

SP NOTES

ii) move D0 X

Machine Instruction

0011 1110 0000 0000

Address of X

a) Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to destination addressMAR lt-- PCMBR M[MAR] read destination address XMAR MBR MAR has X now

MBR D0 source data in MBR

M[MAR] MBR data in Effective address X (which is in MAR)

PC lt-- PC + 2 PC points to next instruction

iii) movea 22(a3)25(a1)

Machine Instruction

0011 0011 0110 1011

0000 0000 0001 0110

0000 0000 0001 1001

a)Source Effective address = contents of a3 + $16temp lt--- M[a3 + $16]

Destination Effective address = contents of a1 + $19M[a1 + $19] lt- temp

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 15

SP NOTES

Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to source displacementMAR lt-- PCMBR lt-- M[MAR] source displacement loadedMAR lt-- [A3 + MBR] source effective address calculatedMBR lt- M[MAR] source data in MBRWR lt- MBR saved temporarily since MBR is to be reusedPC lt-- PC + 2 PC points to destination displacementMAR lt-- PCMBR lt-- M[MAR] destination displacement loaded

MAR lt-- [A1 + MBR] destination effective address calculatedMBR lt- WR source data back to MBR

M[MAR] lt-- MBR source data moved to destination

PC lt-- PC + 2 PC points to next instruction

iv) bgtw label

Machine Instruction

0110 1110 0000 0000 ---- displacement ----

a)MAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | interpret decode |

TRUE FALSE | |

PC lt-- PC + 2 PC lt- PC + 4 MAR lt-- PC MBR lt-- M[MAR] get displacement PC lt-- PC + MBR execute

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 16

SP NOTES

Lecture_no8 Memory unit Register Data of IBM 360

General machine structure of IBM 360370

The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific

(i)Memory unit

Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword

The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory

(ii)Register-

Registers are areas of storage that are set aside in the processing unit

There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)

Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions

(iii)Data

Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits

followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity

Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign

Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17

SP NOTES

three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output

Lecture_no9 Various instruction format of IBM 360370

( iv)Instruction

Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats

RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R

Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field

RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location

Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement

RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces

Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement

SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data

Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement

SS (six bytes) S tells that that there is data in storage Here both operands have data in storage

Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1

Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18

SP NOTES

Addressing Mode

1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D

4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19

SP NOTES

Lecture_no10 Machine language Long way no looping method

Elementary Instruction Set

Typical Data Transfer Instructions

Typical Arithmetic Instructions

Typical Logical and Bit Manipulation Instr

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20

SP NOTES

Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y

MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT

Lecture_no11 Address modification using instruction as data ampusing index register

Lecture_no12 Looping method with example

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21

SP NOTES

Lecture_no13 Assembly language machine language

Assembly LanguageAssembly instructions are just shorthand for machine instructions

Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22

SP NOTES

Very easy to write an Assembly Language -gtMachine language translator

Exercise

What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution

STORE 1 R1MOVE R1 R2LOAD R2 1HALT

Machine Language

The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer

Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer

10000001001001011000000101000101101000010000011010000010000001101111111111111111

LOAD contents of memorylocation into register

100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011

STORE contents of registerinto memory location

100000100 RR MMMMMex Mem[4] = R0100000100 00 00100

MOVE contents of oneregister into another register

100100010000 RR RRex R0 = R1100100010000 00 01

ADD contents of 2 registersstore result in third

1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10

SUBTRACT contents of 2registers store result into third

1010001000 RR RR RRex R0 = R1 ndash R2

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23

SP NOTES

1010001000 00 01 10

Halt the program 1111111111111111

Lecture_no14 Meaning of Assembly language program types Literals

bullUSING

bullBALR

bullSTART

bullEND

bullBR

bullLiteral

bullLTORG

bullBCT

bullEQU ExampleSTART EQU 10

The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write

START EQU 10ORG STARTGOTO 10

bullORG

The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored

Example

ORG 0GOTO 10

The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000

But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24

SP NOTES

bullpseudo-ops vs machine-ops

Lecture_no15 Example on Assembly language program

Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc

LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25

SP NOTES

14 NUMBER BLKW 115 SIX FILL x000616 17 END

Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode

The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7

The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary

LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10

BRp AGAIN

If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label

CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader

Pseudo-ops (Assembler Directives)

Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26

SP NOTES

ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory

Lecture_no16 surprise test

Lecture_no17 MODULE-2

AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source

Cross AssemblerThe assembler may run on one machine and produce object code for another

An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader

(Fig Assembler)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27

SP NOTES

For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program

(Fig Program Translation Steps)

Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer

Basic Assembler Functions

Convert mnemonic operation codes to machine language equivalents

Convert symbolic operands to machine addresses (pass 1)

Build machine instructions in the proper format

Convert data constants to internal (machine)representations

Error checking is provided

Write the object program and assembly listing files

Assembler directives

-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions

-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-

bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in

the program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28

SP NOTES

BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant

WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area

Lecture_no18 General design procedure of 360-type Assembler

Syntax Assembly language statements

Assembly language statements are entered one statement per line Each statement follows the following format

[label] mnemonic [operands] [comment]

The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command

An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)

Ex-

START LDAA 24 H

Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29

SP NOTES

on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character

Types of Assembly language statements

It consists of 3 typeseg

(i)Imperative statement

(ii)Declarative statements

(iii)Assembler directive statements

Advantages amp disadvantages of Assembly language

Lecture_no19 Design Specification of 1-pass assembler with example

Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task

Assembler Design can be done in

ndashSingle pass

ndashTwo pass

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30

Source program

Pass1

OPTAB SYMTAB

Intermediate file

Pass2

SYMTAB

Object codes

OPTAB

SP NOTES

The Intermediate file include each source statement assigned address and error indicator

Pass I

Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location

counter Build the symbol table (Table that is used to store each label and its corresponding value)

Pass II

Generate object code(perform actual translation)

One-Pass Assembler

The translation performed by an assembler is essentially a collection of substitutions machine operation code

for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines

bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31

LOCCTR

SP NOTES

(detailed pass1 assembler flowchart)

Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32

SP NOTES

Lecture_no20 Multi-pass assembler with example

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

(Detailed pass2 assembler flowchart)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Lecture_no3 Evolution of components of system programming

Assembler

An assembler translates assembly language into machine code Assembly language consists of mnemonics for machine opcodes so assemblers perform a 11 translation from mnemonic to a direct instruction For example

LDA 4 converts to 0001001000100100

Conversely one instruction in a high level language will translate to one or more instructions at machine level

Advantages of using an Assembler

Very fast in translating assembly language to machine code as 1 to 1 relationshipAssembly code is often very efficient (and therefore fast) because it is a low level languageAssembly code is fairly easy to understand due to the use of English-like mnemonics

Disadvantages of using Assembler

Assembly language is written for a certain instruction set andor processorAssembly tends to be optimised for the hardware its designed for meaning it is often incompatible with

different hardwareLots of assembly code is needed to do relatively simple tasks and complex programs require lots of

programming time

Compiler

A Compiler is a computer program that translates code written in a high level language to a lower level language objectmachine code The most common reason for translating source code is to create an executable program (converting from a high level language into machine language)

Advantages of using a compiler

Source code is not included therefore compiled code is more secure than interpreted codeTends to produce faster code than interpreting source codeProduces an executable file and therefore the program can be run without need of the source code

Disadvantages of using a compiler

Object code needs to be produced before a final executable file this can be a slow processThe source code must be 100 correct for the executable file to be produced

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 4

SP NOTES

Interpreter

An interpreter program executes other programs directly running through program code and executing it line-by-line As it analyses every line an interpreter is slower than running compiled code but it can take less time to interpret program code than to compile and then run it mdash this is very useful when prototyping and testing code Interpreters are written for multiple platforms this means code written once can be run immediately on different systems without having to recompile for each Examples of this include flash based web programs that will run on your PC MAC games console and Mobile phone

Advantages of using an Interpreter

Easier to debug(check errors) than a compilerEasier to create multi-platform code as each different platform would have an interpreter to run the same

codeUseful for prototyping software and testing basic program logic

Disadvantages of using an Interpreter

Source code is required for the program to be executed and this source code can be read making it insecureInterpreters are generally slower than compiled programs due to the per-line translation method

1)Name the three types of program translator and describe what level of language each translates

Answer

Assembler - Low Level (Assembly) Compiler - High Level Interpreter - High Level

2) Why might you want to use a compiler instead of an interpreter

A compiler makes faster more secure code

3) Why might you want to use an interpreter instead of a compiler

Interpreters allow for code to run on multiple platforms it allows you to debug and test code without having to re-compile the entire source code

Loader

A Loader is system program that place the object program into main memory and prepares it for execution

bull Basic functions of loader

ndash Allocation

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 5

SP NOTES

ndash Linking

ndash Relocation

ndash Loading

Types of Loader-

bull Compile-and-go Loader

bull Relocating Loader

bull Direct Linking Loader

bull Absolute Loader

bull General Loader

bull Dynamic Loader

Macro amp Macro processor

Macro ndash Macro is a single line abbreviation for a group of instruction

MACRO -------- Start of definition

INCR -------- Macro name

A 1DATA

A 2DATA Sequence of instructions to be abbreviated

A 3DATA

MEND -------- End of definition

Linking and Linker

bull Linking ndash The Process of merging many object modules to form a single object program is called as linking

bull Linker bull The Linker is the software program which binds many object modules to make a single object program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 6

SP NOTES

Lecture_no4 Foundation of system programming

1) Formal Systembull A formal system is an un interpreted calculus

It consists of

ndash Alphabets

ndash A set of words called Axioms

ndash Finite set of relations called rules of inference or production rules

ndash Ex Boolean algebra

2) Need Of System Software-The basic need of system software is to achieve the following goals -

bull To achieve efficient performance of the system

bull To make effective execution of general user program

bull To make effective utilization of human resources

bull To make available new better facilities

Need of system programming

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 7

SP NOTES

Lecture_no5 Evolution of operating systemGeneral machine structure

OSJobmultiprogrammingMultitasking

FragmentationPagingScheduling

Facilities of computer os

Operating Systembull It is the collection of system programs which acts as an interface between user and the computer and computer hardware

bull The purpose of an operating system is to provide an environment in which A user can execute programs in a convenient manner

Functions of Operating System

bull File handling and management

bull Storage management (Memory management)

bull Device scheduling and management

bull CPU scheduling

bull Information management

bull Process control (management)

bull Error handling

bull Protecting itself from user amp protecting user from other users

What is MFT

What is MVT

What is Fragmentation

What is paging Explain its various types

What is scheduling

What is meant by the term time sharing

What is virtual memoryWhat are simple batched systems Explain

Give an evolution of an operating system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 8

SP NOTES

Lecture_no6 Description of Von-Neumann machines components

General machine structureVon-Neumannrsquos componentsBlock diagram and functions of each component of Von-neumannrsquos machine

1 The von Neumann Computer Modelbull Von Neumann computer systems contain three main building blocks-gt the central processing unit (CPU)-gt memory-gt and inputoutput devices (IO)bull These three components are connected together using the system busbull The most prominent items within the CPU are the registers they can be manipulated directly by acomputer program The following block diagram shows major relationship between CPU components

(Design of the VonNeumann architecture)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 9

SP NOTES

2 Components of the Von Neumann Model-gt Memory Storage of information (dataprogram)-gt Processing Unit ComputationProcessing of Information-gt Input Means of getting information into the computer eg keyboard mouse-gtOutput Means of getting information out of the computer eg printer monitor bull Control Unit Makes sure that all the other parts perform their tasks correctly and at the correct time The von Neumann Machine

(connection between CPU amp Main memory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 10

SP NOTES

MBR (Memory Buffer Register) = MBR contains a word to be stored in memory or is sued to receive from memory

MAR (Memory Address Register) = MAR specifies the address in memory of the word to be written from or read into the MBR

IR (Instruction Register) = IR contains the 8-bit op-code instruction being executedIBR (Instruction Buffer Register) = IBR is employed to hold temporarily the right-hand instructions

from a word in memoryPC (Program Counter) = PC contains the address of the next instruction-pair to be fetched from

memoryAC and MQ (Accumulator and Multiplier Quotient) = AC and MQ are employed to hold temporarily

operands and results of ALU operations

Fetch The process of retrieving a value from a memory location in order to put it in a register or pass it to a processing unit

IO controller A special-purpose processor that mediates between the main computer processor and a specific input or output device The processor records the request made by the main processor leaving it free to continue other tasks and interrupts the main processor when the input or output task is complete

Input Output The subsystem responsible for managing communications with external devices whether external memory storage other processors or devices for human interaction

Instruction set The set of codes that describe all the legal operations a particular computer can execute

Op code An unsigned binary integer that is assigned to a specific task the hardware can perform

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 11

SP NOTES

Register A special kind of very fast memory which is referred to by name rather than by a memory address number Registers often hold data values that are currently in use by the ALU or other subsystems of the architecture

Random access memory (RAM) The typical computer memory where each cell is addressed individually and may be updated or accessed in time that is independent of the cellrsquos location

Read-only memory (ROM) A memory that is usually very similar to RAM except it may only be accessed not updated

Store The process of writing a value to a memory cell

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 12

SP NOTES

Lecture_no7 Instruction format with microflowchart for ADD instruction

Flowchart for instruction fetchamp execution of general machine structure

Typical operating system Information flow in microflowchart for ADDSUBMULT

Memory operations

FETCH (addr)

1Put addr into MAR2Tell memory unit to ldquoloadrdquo3Memory copies data into MDR

STORE(addrnew-value)

1 Put addr into MAR2 Put new-value into MDR3 Tell memory unit to ldquostorerdquo4 Memory stores data from MDR into memory cell

Problem 1)

a) Write an assembly program for a GPR machine having 8 general purpose registers (R1R7) to evaluate the following expression Ignore the remainder part of the division process

( (a + b) (a + d) ) (a e)

where abde are variables with addresses ABDE respectively and each operation is defined as follows + --------gt ADD Ri Rj ( Rj lt- Ri + Rj) --------gt MULT Ri Rj ( Rj lt- Ri Rj) --------gt DIV Ri Rj ( Rj lt- Rj Ri)

a = M[A] ------gt content of memory location at address A same for B D E

move A D0 D0 = amove B D4 D4 = badd D0 D4 D4 = a + bmove D D1 D1 = dadd D0 D1 D1 = a + ddiv D1D4 D4 = (a + b) (a + d)move E D2 D2 = emult D0 D2 D2 = (a e)mult D4 D2 D2 = ( (a + b) (a + d) ) (a e)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 13

SP NOTES

Problem 2)

a) Provide micro-instructions in the form of detailed flowchart for the execution of the following instructions

b) Show the flow of the micro-instructions within the GPR architecture by highlighting the system buses and the registers used

i)move X D0 move contents of memory location X to register D0ii)move D0 X move contents of register D0 to memory location Xiii)movea 22(a3)25(a1) move contents from memory to memoryiv)bgtw label branch if greater than to location ldquolabelrdquo

Answer 2)

For each of the above for part b) draw the CPU consisting of all the registers (for example MAR MBR PC IR Address and Data Registers ALU) and Memory and show each of the micro-instruction as in part a)

i) move X D0Machine Instruction

0011 0000 0011 1000

Address of X

a) Micro-steps are

MAR lt-- PC

MBR lt-- M[MAR] fetch

IR lt-- MBR

| decode |PC lt-- PC + 2 PC points to source addressMAR lt-- PCMBR lt-- M[MAR] address of X in MBRMAR MBR move X to MARMBR M[MAR] read contents of location XD0 MBR PC lt-- PC + 2 PC points to next instruction

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 14

SP NOTES

ii) move D0 X

Machine Instruction

0011 1110 0000 0000

Address of X

a) Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to destination addressMAR lt-- PCMBR M[MAR] read destination address XMAR MBR MAR has X now

MBR D0 source data in MBR

M[MAR] MBR data in Effective address X (which is in MAR)

PC lt-- PC + 2 PC points to next instruction

iii) movea 22(a3)25(a1)

Machine Instruction

0011 0011 0110 1011

0000 0000 0001 0110

0000 0000 0001 1001

a)Source Effective address = contents of a3 + $16temp lt--- M[a3 + $16]

Destination Effective address = contents of a1 + $19M[a1 + $19] lt- temp

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 15

SP NOTES

Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to source displacementMAR lt-- PCMBR lt-- M[MAR] source displacement loadedMAR lt-- [A3 + MBR] source effective address calculatedMBR lt- M[MAR] source data in MBRWR lt- MBR saved temporarily since MBR is to be reusedPC lt-- PC + 2 PC points to destination displacementMAR lt-- PCMBR lt-- M[MAR] destination displacement loaded

MAR lt-- [A1 + MBR] destination effective address calculatedMBR lt- WR source data back to MBR

M[MAR] lt-- MBR source data moved to destination

PC lt-- PC + 2 PC points to next instruction

iv) bgtw label

Machine Instruction

0110 1110 0000 0000 ---- displacement ----

a)MAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | interpret decode |

TRUE FALSE | |

PC lt-- PC + 2 PC lt- PC + 4 MAR lt-- PC MBR lt-- M[MAR] get displacement PC lt-- PC + MBR execute

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 16

SP NOTES

Lecture_no8 Memory unit Register Data of IBM 360

General machine structure of IBM 360370

The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific

(i)Memory unit

Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword

The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory

(ii)Register-

Registers are areas of storage that are set aside in the processing unit

There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)

Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions

(iii)Data

Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits

followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity

Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign

Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17

SP NOTES

three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output

Lecture_no9 Various instruction format of IBM 360370

( iv)Instruction

Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats

RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R

Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field

RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location

Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement

RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces

Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement

SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data

Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement

SS (six bytes) S tells that that there is data in storage Here both operands have data in storage

Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1

Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18

SP NOTES

Addressing Mode

1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D

4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19

SP NOTES

Lecture_no10 Machine language Long way no looping method

Elementary Instruction Set

Typical Data Transfer Instructions

Typical Arithmetic Instructions

Typical Logical and Bit Manipulation Instr

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20

SP NOTES

Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y

MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT

Lecture_no11 Address modification using instruction as data ampusing index register

Lecture_no12 Looping method with example

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21

SP NOTES

Lecture_no13 Assembly language machine language

Assembly LanguageAssembly instructions are just shorthand for machine instructions

Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22

SP NOTES

Very easy to write an Assembly Language -gtMachine language translator

Exercise

What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution

STORE 1 R1MOVE R1 R2LOAD R2 1HALT

Machine Language

The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer

Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer

10000001001001011000000101000101101000010000011010000010000001101111111111111111

LOAD contents of memorylocation into register

100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011

STORE contents of registerinto memory location

100000100 RR MMMMMex Mem[4] = R0100000100 00 00100

MOVE contents of oneregister into another register

100100010000 RR RRex R0 = R1100100010000 00 01

ADD contents of 2 registersstore result in third

1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10

SUBTRACT contents of 2registers store result into third

1010001000 RR RR RRex R0 = R1 ndash R2

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23

SP NOTES

1010001000 00 01 10

Halt the program 1111111111111111

Lecture_no14 Meaning of Assembly language program types Literals

bullUSING

bullBALR

bullSTART

bullEND

bullBR

bullLiteral

bullLTORG

bullBCT

bullEQU ExampleSTART EQU 10

The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write

START EQU 10ORG STARTGOTO 10

bullORG

The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored

Example

ORG 0GOTO 10

The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000

But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24

SP NOTES

bullpseudo-ops vs machine-ops

Lecture_no15 Example on Assembly language program

Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc

LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25

SP NOTES

14 NUMBER BLKW 115 SIX FILL x000616 17 END

Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode

The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7

The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary

LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10

BRp AGAIN

If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label

CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader

Pseudo-ops (Assembler Directives)

Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26

SP NOTES

ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory

Lecture_no16 surprise test

Lecture_no17 MODULE-2

AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source

Cross AssemblerThe assembler may run on one machine and produce object code for another

An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader

(Fig Assembler)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27

SP NOTES

For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program

(Fig Program Translation Steps)

Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer

Basic Assembler Functions

Convert mnemonic operation codes to machine language equivalents

Convert symbolic operands to machine addresses (pass 1)

Build machine instructions in the proper format

Convert data constants to internal (machine)representations

Error checking is provided

Write the object program and assembly listing files

Assembler directives

-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions

-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-

bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in

the program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28

SP NOTES

BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant

WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area

Lecture_no18 General design procedure of 360-type Assembler

Syntax Assembly language statements

Assembly language statements are entered one statement per line Each statement follows the following format

[label] mnemonic [operands] [comment]

The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command

An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)

Ex-

START LDAA 24 H

Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29

SP NOTES

on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character

Types of Assembly language statements

It consists of 3 typeseg

(i)Imperative statement

(ii)Declarative statements

(iii)Assembler directive statements

Advantages amp disadvantages of Assembly language

Lecture_no19 Design Specification of 1-pass assembler with example

Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task

Assembler Design can be done in

ndashSingle pass

ndashTwo pass

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30

Source program

Pass1

OPTAB SYMTAB

Intermediate file

Pass2

SYMTAB

Object codes

OPTAB

SP NOTES

The Intermediate file include each source statement assigned address and error indicator

Pass I

Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location

counter Build the symbol table (Table that is used to store each label and its corresponding value)

Pass II

Generate object code(perform actual translation)

One-Pass Assembler

The translation performed by an assembler is essentially a collection of substitutions machine operation code

for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines

bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31

LOCCTR

SP NOTES

(detailed pass1 assembler flowchart)

Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32

SP NOTES

Lecture_no20 Multi-pass assembler with example

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

(Detailed pass2 assembler flowchart)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Interpreter

An interpreter program executes other programs directly running through program code and executing it line-by-line As it analyses every line an interpreter is slower than running compiled code but it can take less time to interpret program code than to compile and then run it mdash this is very useful when prototyping and testing code Interpreters are written for multiple platforms this means code written once can be run immediately on different systems without having to recompile for each Examples of this include flash based web programs that will run on your PC MAC games console and Mobile phone

Advantages of using an Interpreter

Easier to debug(check errors) than a compilerEasier to create multi-platform code as each different platform would have an interpreter to run the same

codeUseful for prototyping software and testing basic program logic

Disadvantages of using an Interpreter

Source code is required for the program to be executed and this source code can be read making it insecureInterpreters are generally slower than compiled programs due to the per-line translation method

1)Name the three types of program translator and describe what level of language each translates

Answer

Assembler - Low Level (Assembly) Compiler - High Level Interpreter - High Level

2) Why might you want to use a compiler instead of an interpreter

A compiler makes faster more secure code

3) Why might you want to use an interpreter instead of a compiler

Interpreters allow for code to run on multiple platforms it allows you to debug and test code without having to re-compile the entire source code

Loader

A Loader is system program that place the object program into main memory and prepares it for execution

bull Basic functions of loader

ndash Allocation

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 5

SP NOTES

ndash Linking

ndash Relocation

ndash Loading

Types of Loader-

bull Compile-and-go Loader

bull Relocating Loader

bull Direct Linking Loader

bull Absolute Loader

bull General Loader

bull Dynamic Loader

Macro amp Macro processor

Macro ndash Macro is a single line abbreviation for a group of instruction

MACRO -------- Start of definition

INCR -------- Macro name

A 1DATA

A 2DATA Sequence of instructions to be abbreviated

A 3DATA

MEND -------- End of definition

Linking and Linker

bull Linking ndash The Process of merging many object modules to form a single object program is called as linking

bull Linker bull The Linker is the software program which binds many object modules to make a single object program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 6

SP NOTES

Lecture_no4 Foundation of system programming

1) Formal Systembull A formal system is an un interpreted calculus

It consists of

ndash Alphabets

ndash A set of words called Axioms

ndash Finite set of relations called rules of inference or production rules

ndash Ex Boolean algebra

2) Need Of System Software-The basic need of system software is to achieve the following goals -

bull To achieve efficient performance of the system

bull To make effective execution of general user program

bull To make effective utilization of human resources

bull To make available new better facilities

Need of system programming

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 7

SP NOTES

Lecture_no5 Evolution of operating systemGeneral machine structure

OSJobmultiprogrammingMultitasking

FragmentationPagingScheduling

Facilities of computer os

Operating Systembull It is the collection of system programs which acts as an interface between user and the computer and computer hardware

bull The purpose of an operating system is to provide an environment in which A user can execute programs in a convenient manner

Functions of Operating System

bull File handling and management

bull Storage management (Memory management)

bull Device scheduling and management

bull CPU scheduling

bull Information management

bull Process control (management)

bull Error handling

bull Protecting itself from user amp protecting user from other users

What is MFT

What is MVT

What is Fragmentation

What is paging Explain its various types

What is scheduling

What is meant by the term time sharing

What is virtual memoryWhat are simple batched systems Explain

Give an evolution of an operating system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 8

SP NOTES

Lecture_no6 Description of Von-Neumann machines components

General machine structureVon-Neumannrsquos componentsBlock diagram and functions of each component of Von-neumannrsquos machine

1 The von Neumann Computer Modelbull Von Neumann computer systems contain three main building blocks-gt the central processing unit (CPU)-gt memory-gt and inputoutput devices (IO)bull These three components are connected together using the system busbull The most prominent items within the CPU are the registers they can be manipulated directly by acomputer program The following block diagram shows major relationship between CPU components

(Design of the VonNeumann architecture)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 9

SP NOTES

2 Components of the Von Neumann Model-gt Memory Storage of information (dataprogram)-gt Processing Unit ComputationProcessing of Information-gt Input Means of getting information into the computer eg keyboard mouse-gtOutput Means of getting information out of the computer eg printer monitor bull Control Unit Makes sure that all the other parts perform their tasks correctly and at the correct time The von Neumann Machine

(connection between CPU amp Main memory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 10

SP NOTES

MBR (Memory Buffer Register) = MBR contains a word to be stored in memory or is sued to receive from memory

MAR (Memory Address Register) = MAR specifies the address in memory of the word to be written from or read into the MBR

IR (Instruction Register) = IR contains the 8-bit op-code instruction being executedIBR (Instruction Buffer Register) = IBR is employed to hold temporarily the right-hand instructions

from a word in memoryPC (Program Counter) = PC contains the address of the next instruction-pair to be fetched from

memoryAC and MQ (Accumulator and Multiplier Quotient) = AC and MQ are employed to hold temporarily

operands and results of ALU operations

Fetch The process of retrieving a value from a memory location in order to put it in a register or pass it to a processing unit

IO controller A special-purpose processor that mediates between the main computer processor and a specific input or output device The processor records the request made by the main processor leaving it free to continue other tasks and interrupts the main processor when the input or output task is complete

Input Output The subsystem responsible for managing communications with external devices whether external memory storage other processors or devices for human interaction

Instruction set The set of codes that describe all the legal operations a particular computer can execute

Op code An unsigned binary integer that is assigned to a specific task the hardware can perform

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 11

SP NOTES

Register A special kind of very fast memory which is referred to by name rather than by a memory address number Registers often hold data values that are currently in use by the ALU or other subsystems of the architecture

Random access memory (RAM) The typical computer memory where each cell is addressed individually and may be updated or accessed in time that is independent of the cellrsquos location

Read-only memory (ROM) A memory that is usually very similar to RAM except it may only be accessed not updated

Store The process of writing a value to a memory cell

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 12

SP NOTES

Lecture_no7 Instruction format with microflowchart for ADD instruction

Flowchart for instruction fetchamp execution of general machine structure

Typical operating system Information flow in microflowchart for ADDSUBMULT

Memory operations

FETCH (addr)

1Put addr into MAR2Tell memory unit to ldquoloadrdquo3Memory copies data into MDR

STORE(addrnew-value)

1 Put addr into MAR2 Put new-value into MDR3 Tell memory unit to ldquostorerdquo4 Memory stores data from MDR into memory cell

Problem 1)

a) Write an assembly program for a GPR machine having 8 general purpose registers (R1R7) to evaluate the following expression Ignore the remainder part of the division process

( (a + b) (a + d) ) (a e)

where abde are variables with addresses ABDE respectively and each operation is defined as follows + --------gt ADD Ri Rj ( Rj lt- Ri + Rj) --------gt MULT Ri Rj ( Rj lt- Ri Rj) --------gt DIV Ri Rj ( Rj lt- Rj Ri)

a = M[A] ------gt content of memory location at address A same for B D E

move A D0 D0 = amove B D4 D4 = badd D0 D4 D4 = a + bmove D D1 D1 = dadd D0 D1 D1 = a + ddiv D1D4 D4 = (a + b) (a + d)move E D2 D2 = emult D0 D2 D2 = (a e)mult D4 D2 D2 = ( (a + b) (a + d) ) (a e)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 13

SP NOTES

Problem 2)

a) Provide micro-instructions in the form of detailed flowchart for the execution of the following instructions

b) Show the flow of the micro-instructions within the GPR architecture by highlighting the system buses and the registers used

i)move X D0 move contents of memory location X to register D0ii)move D0 X move contents of register D0 to memory location Xiii)movea 22(a3)25(a1) move contents from memory to memoryiv)bgtw label branch if greater than to location ldquolabelrdquo

Answer 2)

For each of the above for part b) draw the CPU consisting of all the registers (for example MAR MBR PC IR Address and Data Registers ALU) and Memory and show each of the micro-instruction as in part a)

i) move X D0Machine Instruction

0011 0000 0011 1000

Address of X

a) Micro-steps are

MAR lt-- PC

MBR lt-- M[MAR] fetch

IR lt-- MBR

| decode |PC lt-- PC + 2 PC points to source addressMAR lt-- PCMBR lt-- M[MAR] address of X in MBRMAR MBR move X to MARMBR M[MAR] read contents of location XD0 MBR PC lt-- PC + 2 PC points to next instruction

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 14

SP NOTES

ii) move D0 X

Machine Instruction

0011 1110 0000 0000

Address of X

a) Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to destination addressMAR lt-- PCMBR M[MAR] read destination address XMAR MBR MAR has X now

MBR D0 source data in MBR

M[MAR] MBR data in Effective address X (which is in MAR)

PC lt-- PC + 2 PC points to next instruction

iii) movea 22(a3)25(a1)

Machine Instruction

0011 0011 0110 1011

0000 0000 0001 0110

0000 0000 0001 1001

a)Source Effective address = contents of a3 + $16temp lt--- M[a3 + $16]

Destination Effective address = contents of a1 + $19M[a1 + $19] lt- temp

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 15

SP NOTES

Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to source displacementMAR lt-- PCMBR lt-- M[MAR] source displacement loadedMAR lt-- [A3 + MBR] source effective address calculatedMBR lt- M[MAR] source data in MBRWR lt- MBR saved temporarily since MBR is to be reusedPC lt-- PC + 2 PC points to destination displacementMAR lt-- PCMBR lt-- M[MAR] destination displacement loaded

MAR lt-- [A1 + MBR] destination effective address calculatedMBR lt- WR source data back to MBR

M[MAR] lt-- MBR source data moved to destination

PC lt-- PC + 2 PC points to next instruction

iv) bgtw label

Machine Instruction

0110 1110 0000 0000 ---- displacement ----

a)MAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | interpret decode |

TRUE FALSE | |

PC lt-- PC + 2 PC lt- PC + 4 MAR lt-- PC MBR lt-- M[MAR] get displacement PC lt-- PC + MBR execute

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 16

SP NOTES

Lecture_no8 Memory unit Register Data of IBM 360

General machine structure of IBM 360370

The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific

(i)Memory unit

Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword

The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory

(ii)Register-

Registers are areas of storage that are set aside in the processing unit

There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)

Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions

(iii)Data

Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits

followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity

Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign

Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17

SP NOTES

three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output

Lecture_no9 Various instruction format of IBM 360370

( iv)Instruction

Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats

RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R

Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field

RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location

Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement

RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces

Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement

SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data

Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement

SS (six bytes) S tells that that there is data in storage Here both operands have data in storage

Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1

Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18

SP NOTES

Addressing Mode

1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D

4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19

SP NOTES

Lecture_no10 Machine language Long way no looping method

Elementary Instruction Set

Typical Data Transfer Instructions

Typical Arithmetic Instructions

Typical Logical and Bit Manipulation Instr

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20

SP NOTES

Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y

MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT

Lecture_no11 Address modification using instruction as data ampusing index register

Lecture_no12 Looping method with example

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21

SP NOTES

Lecture_no13 Assembly language machine language

Assembly LanguageAssembly instructions are just shorthand for machine instructions

Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22

SP NOTES

Very easy to write an Assembly Language -gtMachine language translator

Exercise

What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution

STORE 1 R1MOVE R1 R2LOAD R2 1HALT

Machine Language

The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer

Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer

10000001001001011000000101000101101000010000011010000010000001101111111111111111

LOAD contents of memorylocation into register

100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011

STORE contents of registerinto memory location

100000100 RR MMMMMex Mem[4] = R0100000100 00 00100

MOVE contents of oneregister into another register

100100010000 RR RRex R0 = R1100100010000 00 01

ADD contents of 2 registersstore result in third

1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10

SUBTRACT contents of 2registers store result into third

1010001000 RR RR RRex R0 = R1 ndash R2

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23

SP NOTES

1010001000 00 01 10

Halt the program 1111111111111111

Lecture_no14 Meaning of Assembly language program types Literals

bullUSING

bullBALR

bullSTART

bullEND

bullBR

bullLiteral

bullLTORG

bullBCT

bullEQU ExampleSTART EQU 10

The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write

START EQU 10ORG STARTGOTO 10

bullORG

The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored

Example

ORG 0GOTO 10

The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000

But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24

SP NOTES

bullpseudo-ops vs machine-ops

Lecture_no15 Example on Assembly language program

Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc

LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25

SP NOTES

14 NUMBER BLKW 115 SIX FILL x000616 17 END

Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode

The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7

The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary

LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10

BRp AGAIN

If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label

CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader

Pseudo-ops (Assembler Directives)

Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26

SP NOTES

ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory

Lecture_no16 surprise test

Lecture_no17 MODULE-2

AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source

Cross AssemblerThe assembler may run on one machine and produce object code for another

An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader

(Fig Assembler)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27

SP NOTES

For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program

(Fig Program Translation Steps)

Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer

Basic Assembler Functions

Convert mnemonic operation codes to machine language equivalents

Convert symbolic operands to machine addresses (pass 1)

Build machine instructions in the proper format

Convert data constants to internal (machine)representations

Error checking is provided

Write the object program and assembly listing files

Assembler directives

-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions

-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-

bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in

the program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28

SP NOTES

BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant

WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area

Lecture_no18 General design procedure of 360-type Assembler

Syntax Assembly language statements

Assembly language statements are entered one statement per line Each statement follows the following format

[label] mnemonic [operands] [comment]

The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command

An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)

Ex-

START LDAA 24 H

Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29

SP NOTES

on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character

Types of Assembly language statements

It consists of 3 typeseg

(i)Imperative statement

(ii)Declarative statements

(iii)Assembler directive statements

Advantages amp disadvantages of Assembly language

Lecture_no19 Design Specification of 1-pass assembler with example

Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task

Assembler Design can be done in

ndashSingle pass

ndashTwo pass

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30

Source program

Pass1

OPTAB SYMTAB

Intermediate file

Pass2

SYMTAB

Object codes

OPTAB

SP NOTES

The Intermediate file include each source statement assigned address and error indicator

Pass I

Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location

counter Build the symbol table (Table that is used to store each label and its corresponding value)

Pass II

Generate object code(perform actual translation)

One-Pass Assembler

The translation performed by an assembler is essentially a collection of substitutions machine operation code

for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines

bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31

LOCCTR

SP NOTES

(detailed pass1 assembler flowchart)

Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32

SP NOTES

Lecture_no20 Multi-pass assembler with example

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

(Detailed pass2 assembler flowchart)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

ndash Linking

ndash Relocation

ndash Loading

Types of Loader-

bull Compile-and-go Loader

bull Relocating Loader

bull Direct Linking Loader

bull Absolute Loader

bull General Loader

bull Dynamic Loader

Macro amp Macro processor

Macro ndash Macro is a single line abbreviation for a group of instruction

MACRO -------- Start of definition

INCR -------- Macro name

A 1DATA

A 2DATA Sequence of instructions to be abbreviated

A 3DATA

MEND -------- End of definition

Linking and Linker

bull Linking ndash The Process of merging many object modules to form a single object program is called as linking

bull Linker bull The Linker is the software program which binds many object modules to make a single object program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 6

SP NOTES

Lecture_no4 Foundation of system programming

1) Formal Systembull A formal system is an un interpreted calculus

It consists of

ndash Alphabets

ndash A set of words called Axioms

ndash Finite set of relations called rules of inference or production rules

ndash Ex Boolean algebra

2) Need Of System Software-The basic need of system software is to achieve the following goals -

bull To achieve efficient performance of the system

bull To make effective execution of general user program

bull To make effective utilization of human resources

bull To make available new better facilities

Need of system programming

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 7

SP NOTES

Lecture_no5 Evolution of operating systemGeneral machine structure

OSJobmultiprogrammingMultitasking

FragmentationPagingScheduling

Facilities of computer os

Operating Systembull It is the collection of system programs which acts as an interface between user and the computer and computer hardware

bull The purpose of an operating system is to provide an environment in which A user can execute programs in a convenient manner

Functions of Operating System

bull File handling and management

bull Storage management (Memory management)

bull Device scheduling and management

bull CPU scheduling

bull Information management

bull Process control (management)

bull Error handling

bull Protecting itself from user amp protecting user from other users

What is MFT

What is MVT

What is Fragmentation

What is paging Explain its various types

What is scheduling

What is meant by the term time sharing

What is virtual memoryWhat are simple batched systems Explain

Give an evolution of an operating system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 8

SP NOTES

Lecture_no6 Description of Von-Neumann machines components

General machine structureVon-Neumannrsquos componentsBlock diagram and functions of each component of Von-neumannrsquos machine

1 The von Neumann Computer Modelbull Von Neumann computer systems contain three main building blocks-gt the central processing unit (CPU)-gt memory-gt and inputoutput devices (IO)bull These three components are connected together using the system busbull The most prominent items within the CPU are the registers they can be manipulated directly by acomputer program The following block diagram shows major relationship between CPU components

(Design of the VonNeumann architecture)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 9

SP NOTES

2 Components of the Von Neumann Model-gt Memory Storage of information (dataprogram)-gt Processing Unit ComputationProcessing of Information-gt Input Means of getting information into the computer eg keyboard mouse-gtOutput Means of getting information out of the computer eg printer monitor bull Control Unit Makes sure that all the other parts perform their tasks correctly and at the correct time The von Neumann Machine

(connection between CPU amp Main memory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 10

SP NOTES

MBR (Memory Buffer Register) = MBR contains a word to be stored in memory or is sued to receive from memory

MAR (Memory Address Register) = MAR specifies the address in memory of the word to be written from or read into the MBR

IR (Instruction Register) = IR contains the 8-bit op-code instruction being executedIBR (Instruction Buffer Register) = IBR is employed to hold temporarily the right-hand instructions

from a word in memoryPC (Program Counter) = PC contains the address of the next instruction-pair to be fetched from

memoryAC and MQ (Accumulator and Multiplier Quotient) = AC and MQ are employed to hold temporarily

operands and results of ALU operations

Fetch The process of retrieving a value from a memory location in order to put it in a register or pass it to a processing unit

IO controller A special-purpose processor that mediates between the main computer processor and a specific input or output device The processor records the request made by the main processor leaving it free to continue other tasks and interrupts the main processor when the input or output task is complete

Input Output The subsystem responsible for managing communications with external devices whether external memory storage other processors or devices for human interaction

Instruction set The set of codes that describe all the legal operations a particular computer can execute

Op code An unsigned binary integer that is assigned to a specific task the hardware can perform

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 11

SP NOTES

Register A special kind of very fast memory which is referred to by name rather than by a memory address number Registers often hold data values that are currently in use by the ALU or other subsystems of the architecture

Random access memory (RAM) The typical computer memory where each cell is addressed individually and may be updated or accessed in time that is independent of the cellrsquos location

Read-only memory (ROM) A memory that is usually very similar to RAM except it may only be accessed not updated

Store The process of writing a value to a memory cell

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 12

SP NOTES

Lecture_no7 Instruction format with microflowchart for ADD instruction

Flowchart for instruction fetchamp execution of general machine structure

Typical operating system Information flow in microflowchart for ADDSUBMULT

Memory operations

FETCH (addr)

1Put addr into MAR2Tell memory unit to ldquoloadrdquo3Memory copies data into MDR

STORE(addrnew-value)

1 Put addr into MAR2 Put new-value into MDR3 Tell memory unit to ldquostorerdquo4 Memory stores data from MDR into memory cell

Problem 1)

a) Write an assembly program for a GPR machine having 8 general purpose registers (R1R7) to evaluate the following expression Ignore the remainder part of the division process

( (a + b) (a + d) ) (a e)

where abde are variables with addresses ABDE respectively and each operation is defined as follows + --------gt ADD Ri Rj ( Rj lt- Ri + Rj) --------gt MULT Ri Rj ( Rj lt- Ri Rj) --------gt DIV Ri Rj ( Rj lt- Rj Ri)

a = M[A] ------gt content of memory location at address A same for B D E

move A D0 D0 = amove B D4 D4 = badd D0 D4 D4 = a + bmove D D1 D1 = dadd D0 D1 D1 = a + ddiv D1D4 D4 = (a + b) (a + d)move E D2 D2 = emult D0 D2 D2 = (a e)mult D4 D2 D2 = ( (a + b) (a + d) ) (a e)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 13

SP NOTES

Problem 2)

a) Provide micro-instructions in the form of detailed flowchart for the execution of the following instructions

b) Show the flow of the micro-instructions within the GPR architecture by highlighting the system buses and the registers used

i)move X D0 move contents of memory location X to register D0ii)move D0 X move contents of register D0 to memory location Xiii)movea 22(a3)25(a1) move contents from memory to memoryiv)bgtw label branch if greater than to location ldquolabelrdquo

Answer 2)

For each of the above for part b) draw the CPU consisting of all the registers (for example MAR MBR PC IR Address and Data Registers ALU) and Memory and show each of the micro-instruction as in part a)

i) move X D0Machine Instruction

0011 0000 0011 1000

Address of X

a) Micro-steps are

MAR lt-- PC

MBR lt-- M[MAR] fetch

IR lt-- MBR

| decode |PC lt-- PC + 2 PC points to source addressMAR lt-- PCMBR lt-- M[MAR] address of X in MBRMAR MBR move X to MARMBR M[MAR] read contents of location XD0 MBR PC lt-- PC + 2 PC points to next instruction

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 14

SP NOTES

ii) move D0 X

Machine Instruction

0011 1110 0000 0000

Address of X

a) Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to destination addressMAR lt-- PCMBR M[MAR] read destination address XMAR MBR MAR has X now

MBR D0 source data in MBR

M[MAR] MBR data in Effective address X (which is in MAR)

PC lt-- PC + 2 PC points to next instruction

iii) movea 22(a3)25(a1)

Machine Instruction

0011 0011 0110 1011

0000 0000 0001 0110

0000 0000 0001 1001

a)Source Effective address = contents of a3 + $16temp lt--- M[a3 + $16]

Destination Effective address = contents of a1 + $19M[a1 + $19] lt- temp

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 15

SP NOTES

Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to source displacementMAR lt-- PCMBR lt-- M[MAR] source displacement loadedMAR lt-- [A3 + MBR] source effective address calculatedMBR lt- M[MAR] source data in MBRWR lt- MBR saved temporarily since MBR is to be reusedPC lt-- PC + 2 PC points to destination displacementMAR lt-- PCMBR lt-- M[MAR] destination displacement loaded

MAR lt-- [A1 + MBR] destination effective address calculatedMBR lt- WR source data back to MBR

M[MAR] lt-- MBR source data moved to destination

PC lt-- PC + 2 PC points to next instruction

iv) bgtw label

Machine Instruction

0110 1110 0000 0000 ---- displacement ----

a)MAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | interpret decode |

TRUE FALSE | |

PC lt-- PC + 2 PC lt- PC + 4 MAR lt-- PC MBR lt-- M[MAR] get displacement PC lt-- PC + MBR execute

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 16

SP NOTES

Lecture_no8 Memory unit Register Data of IBM 360

General machine structure of IBM 360370

The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific

(i)Memory unit

Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword

The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory

(ii)Register-

Registers are areas of storage that are set aside in the processing unit

There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)

Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions

(iii)Data

Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits

followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity

Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign

Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17

SP NOTES

three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output

Lecture_no9 Various instruction format of IBM 360370

( iv)Instruction

Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats

RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R

Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field

RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location

Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement

RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces

Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement

SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data

Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement

SS (six bytes) S tells that that there is data in storage Here both operands have data in storage

Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1

Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18

SP NOTES

Addressing Mode

1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D

4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19

SP NOTES

Lecture_no10 Machine language Long way no looping method

Elementary Instruction Set

Typical Data Transfer Instructions

Typical Arithmetic Instructions

Typical Logical and Bit Manipulation Instr

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20

SP NOTES

Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y

MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT

Lecture_no11 Address modification using instruction as data ampusing index register

Lecture_no12 Looping method with example

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21

SP NOTES

Lecture_no13 Assembly language machine language

Assembly LanguageAssembly instructions are just shorthand for machine instructions

Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22

SP NOTES

Very easy to write an Assembly Language -gtMachine language translator

Exercise

What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution

STORE 1 R1MOVE R1 R2LOAD R2 1HALT

Machine Language

The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer

Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer

10000001001001011000000101000101101000010000011010000010000001101111111111111111

LOAD contents of memorylocation into register

100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011

STORE contents of registerinto memory location

100000100 RR MMMMMex Mem[4] = R0100000100 00 00100

MOVE contents of oneregister into another register

100100010000 RR RRex R0 = R1100100010000 00 01

ADD contents of 2 registersstore result in third

1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10

SUBTRACT contents of 2registers store result into third

1010001000 RR RR RRex R0 = R1 ndash R2

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23

SP NOTES

1010001000 00 01 10

Halt the program 1111111111111111

Lecture_no14 Meaning of Assembly language program types Literals

bullUSING

bullBALR

bullSTART

bullEND

bullBR

bullLiteral

bullLTORG

bullBCT

bullEQU ExampleSTART EQU 10

The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write

START EQU 10ORG STARTGOTO 10

bullORG

The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored

Example

ORG 0GOTO 10

The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000

But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24

SP NOTES

bullpseudo-ops vs machine-ops

Lecture_no15 Example on Assembly language program

Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc

LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25

SP NOTES

14 NUMBER BLKW 115 SIX FILL x000616 17 END

Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode

The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7

The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary

LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10

BRp AGAIN

If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label

CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader

Pseudo-ops (Assembler Directives)

Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26

SP NOTES

ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory

Lecture_no16 surprise test

Lecture_no17 MODULE-2

AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source

Cross AssemblerThe assembler may run on one machine and produce object code for another

An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader

(Fig Assembler)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27

SP NOTES

For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program

(Fig Program Translation Steps)

Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer

Basic Assembler Functions

Convert mnemonic operation codes to machine language equivalents

Convert symbolic operands to machine addresses (pass 1)

Build machine instructions in the proper format

Convert data constants to internal (machine)representations

Error checking is provided

Write the object program and assembly listing files

Assembler directives

-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions

-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-

bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in

the program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28

SP NOTES

BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant

WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area

Lecture_no18 General design procedure of 360-type Assembler

Syntax Assembly language statements

Assembly language statements are entered one statement per line Each statement follows the following format

[label] mnemonic [operands] [comment]

The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command

An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)

Ex-

START LDAA 24 H

Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29

SP NOTES

on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character

Types of Assembly language statements

It consists of 3 typeseg

(i)Imperative statement

(ii)Declarative statements

(iii)Assembler directive statements

Advantages amp disadvantages of Assembly language

Lecture_no19 Design Specification of 1-pass assembler with example

Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task

Assembler Design can be done in

ndashSingle pass

ndashTwo pass

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30

Source program

Pass1

OPTAB SYMTAB

Intermediate file

Pass2

SYMTAB

Object codes

OPTAB

SP NOTES

The Intermediate file include each source statement assigned address and error indicator

Pass I

Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location

counter Build the symbol table (Table that is used to store each label and its corresponding value)

Pass II

Generate object code(perform actual translation)

One-Pass Assembler

The translation performed by an assembler is essentially a collection of substitutions machine operation code

for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines

bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31

LOCCTR

SP NOTES

(detailed pass1 assembler flowchart)

Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32

SP NOTES

Lecture_no20 Multi-pass assembler with example

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

(Detailed pass2 assembler flowchart)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Lecture_no4 Foundation of system programming

1) Formal Systembull A formal system is an un interpreted calculus

It consists of

ndash Alphabets

ndash A set of words called Axioms

ndash Finite set of relations called rules of inference or production rules

ndash Ex Boolean algebra

2) Need Of System Software-The basic need of system software is to achieve the following goals -

bull To achieve efficient performance of the system

bull To make effective execution of general user program

bull To make effective utilization of human resources

bull To make available new better facilities

Need of system programming

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 7

SP NOTES

Lecture_no5 Evolution of operating systemGeneral machine structure

OSJobmultiprogrammingMultitasking

FragmentationPagingScheduling

Facilities of computer os

Operating Systembull It is the collection of system programs which acts as an interface between user and the computer and computer hardware

bull The purpose of an operating system is to provide an environment in which A user can execute programs in a convenient manner

Functions of Operating System

bull File handling and management

bull Storage management (Memory management)

bull Device scheduling and management

bull CPU scheduling

bull Information management

bull Process control (management)

bull Error handling

bull Protecting itself from user amp protecting user from other users

What is MFT

What is MVT

What is Fragmentation

What is paging Explain its various types

What is scheduling

What is meant by the term time sharing

What is virtual memoryWhat are simple batched systems Explain

Give an evolution of an operating system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 8

SP NOTES

Lecture_no6 Description of Von-Neumann machines components

General machine structureVon-Neumannrsquos componentsBlock diagram and functions of each component of Von-neumannrsquos machine

1 The von Neumann Computer Modelbull Von Neumann computer systems contain three main building blocks-gt the central processing unit (CPU)-gt memory-gt and inputoutput devices (IO)bull These three components are connected together using the system busbull The most prominent items within the CPU are the registers they can be manipulated directly by acomputer program The following block diagram shows major relationship between CPU components

(Design of the VonNeumann architecture)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 9

SP NOTES

2 Components of the Von Neumann Model-gt Memory Storage of information (dataprogram)-gt Processing Unit ComputationProcessing of Information-gt Input Means of getting information into the computer eg keyboard mouse-gtOutput Means of getting information out of the computer eg printer monitor bull Control Unit Makes sure that all the other parts perform their tasks correctly and at the correct time The von Neumann Machine

(connection between CPU amp Main memory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 10

SP NOTES

MBR (Memory Buffer Register) = MBR contains a word to be stored in memory or is sued to receive from memory

MAR (Memory Address Register) = MAR specifies the address in memory of the word to be written from or read into the MBR

IR (Instruction Register) = IR contains the 8-bit op-code instruction being executedIBR (Instruction Buffer Register) = IBR is employed to hold temporarily the right-hand instructions

from a word in memoryPC (Program Counter) = PC contains the address of the next instruction-pair to be fetched from

memoryAC and MQ (Accumulator and Multiplier Quotient) = AC and MQ are employed to hold temporarily

operands and results of ALU operations

Fetch The process of retrieving a value from a memory location in order to put it in a register or pass it to a processing unit

IO controller A special-purpose processor that mediates between the main computer processor and a specific input or output device The processor records the request made by the main processor leaving it free to continue other tasks and interrupts the main processor when the input or output task is complete

Input Output The subsystem responsible for managing communications with external devices whether external memory storage other processors or devices for human interaction

Instruction set The set of codes that describe all the legal operations a particular computer can execute

Op code An unsigned binary integer that is assigned to a specific task the hardware can perform

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 11

SP NOTES

Register A special kind of very fast memory which is referred to by name rather than by a memory address number Registers often hold data values that are currently in use by the ALU or other subsystems of the architecture

Random access memory (RAM) The typical computer memory where each cell is addressed individually and may be updated or accessed in time that is independent of the cellrsquos location

Read-only memory (ROM) A memory that is usually very similar to RAM except it may only be accessed not updated

Store The process of writing a value to a memory cell

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 12

SP NOTES

Lecture_no7 Instruction format with microflowchart for ADD instruction

Flowchart for instruction fetchamp execution of general machine structure

Typical operating system Information flow in microflowchart for ADDSUBMULT

Memory operations

FETCH (addr)

1Put addr into MAR2Tell memory unit to ldquoloadrdquo3Memory copies data into MDR

STORE(addrnew-value)

1 Put addr into MAR2 Put new-value into MDR3 Tell memory unit to ldquostorerdquo4 Memory stores data from MDR into memory cell

Problem 1)

a) Write an assembly program for a GPR machine having 8 general purpose registers (R1R7) to evaluate the following expression Ignore the remainder part of the division process

( (a + b) (a + d) ) (a e)

where abde are variables with addresses ABDE respectively and each operation is defined as follows + --------gt ADD Ri Rj ( Rj lt- Ri + Rj) --------gt MULT Ri Rj ( Rj lt- Ri Rj) --------gt DIV Ri Rj ( Rj lt- Rj Ri)

a = M[A] ------gt content of memory location at address A same for B D E

move A D0 D0 = amove B D4 D4 = badd D0 D4 D4 = a + bmove D D1 D1 = dadd D0 D1 D1 = a + ddiv D1D4 D4 = (a + b) (a + d)move E D2 D2 = emult D0 D2 D2 = (a e)mult D4 D2 D2 = ( (a + b) (a + d) ) (a e)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 13

SP NOTES

Problem 2)

a) Provide micro-instructions in the form of detailed flowchart for the execution of the following instructions

b) Show the flow of the micro-instructions within the GPR architecture by highlighting the system buses and the registers used

i)move X D0 move contents of memory location X to register D0ii)move D0 X move contents of register D0 to memory location Xiii)movea 22(a3)25(a1) move contents from memory to memoryiv)bgtw label branch if greater than to location ldquolabelrdquo

Answer 2)

For each of the above for part b) draw the CPU consisting of all the registers (for example MAR MBR PC IR Address and Data Registers ALU) and Memory and show each of the micro-instruction as in part a)

i) move X D0Machine Instruction

0011 0000 0011 1000

Address of X

a) Micro-steps are

MAR lt-- PC

MBR lt-- M[MAR] fetch

IR lt-- MBR

| decode |PC lt-- PC + 2 PC points to source addressMAR lt-- PCMBR lt-- M[MAR] address of X in MBRMAR MBR move X to MARMBR M[MAR] read contents of location XD0 MBR PC lt-- PC + 2 PC points to next instruction

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 14

SP NOTES

ii) move D0 X

Machine Instruction

0011 1110 0000 0000

Address of X

a) Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to destination addressMAR lt-- PCMBR M[MAR] read destination address XMAR MBR MAR has X now

MBR D0 source data in MBR

M[MAR] MBR data in Effective address X (which is in MAR)

PC lt-- PC + 2 PC points to next instruction

iii) movea 22(a3)25(a1)

Machine Instruction

0011 0011 0110 1011

0000 0000 0001 0110

0000 0000 0001 1001

a)Source Effective address = contents of a3 + $16temp lt--- M[a3 + $16]

Destination Effective address = contents of a1 + $19M[a1 + $19] lt- temp

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 15

SP NOTES

Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to source displacementMAR lt-- PCMBR lt-- M[MAR] source displacement loadedMAR lt-- [A3 + MBR] source effective address calculatedMBR lt- M[MAR] source data in MBRWR lt- MBR saved temporarily since MBR is to be reusedPC lt-- PC + 2 PC points to destination displacementMAR lt-- PCMBR lt-- M[MAR] destination displacement loaded

MAR lt-- [A1 + MBR] destination effective address calculatedMBR lt- WR source data back to MBR

M[MAR] lt-- MBR source data moved to destination

PC lt-- PC + 2 PC points to next instruction

iv) bgtw label

Machine Instruction

0110 1110 0000 0000 ---- displacement ----

a)MAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | interpret decode |

TRUE FALSE | |

PC lt-- PC + 2 PC lt- PC + 4 MAR lt-- PC MBR lt-- M[MAR] get displacement PC lt-- PC + MBR execute

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 16

SP NOTES

Lecture_no8 Memory unit Register Data of IBM 360

General machine structure of IBM 360370

The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific

(i)Memory unit

Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword

The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory

(ii)Register-

Registers are areas of storage that are set aside in the processing unit

There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)

Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions

(iii)Data

Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits

followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity

Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign

Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17

SP NOTES

three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output

Lecture_no9 Various instruction format of IBM 360370

( iv)Instruction

Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats

RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R

Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field

RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location

Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement

RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces

Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement

SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data

Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement

SS (six bytes) S tells that that there is data in storage Here both operands have data in storage

Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1

Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18

SP NOTES

Addressing Mode

1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D

4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19

SP NOTES

Lecture_no10 Machine language Long way no looping method

Elementary Instruction Set

Typical Data Transfer Instructions

Typical Arithmetic Instructions

Typical Logical and Bit Manipulation Instr

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20

SP NOTES

Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y

MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT

Lecture_no11 Address modification using instruction as data ampusing index register

Lecture_no12 Looping method with example

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21

SP NOTES

Lecture_no13 Assembly language machine language

Assembly LanguageAssembly instructions are just shorthand for machine instructions

Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22

SP NOTES

Very easy to write an Assembly Language -gtMachine language translator

Exercise

What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution

STORE 1 R1MOVE R1 R2LOAD R2 1HALT

Machine Language

The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer

Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer

10000001001001011000000101000101101000010000011010000010000001101111111111111111

LOAD contents of memorylocation into register

100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011

STORE contents of registerinto memory location

100000100 RR MMMMMex Mem[4] = R0100000100 00 00100

MOVE contents of oneregister into another register

100100010000 RR RRex R0 = R1100100010000 00 01

ADD contents of 2 registersstore result in third

1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10

SUBTRACT contents of 2registers store result into third

1010001000 RR RR RRex R0 = R1 ndash R2

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23

SP NOTES

1010001000 00 01 10

Halt the program 1111111111111111

Lecture_no14 Meaning of Assembly language program types Literals

bullUSING

bullBALR

bullSTART

bullEND

bullBR

bullLiteral

bullLTORG

bullBCT

bullEQU ExampleSTART EQU 10

The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write

START EQU 10ORG STARTGOTO 10

bullORG

The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored

Example

ORG 0GOTO 10

The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000

But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24

SP NOTES

bullpseudo-ops vs machine-ops

Lecture_no15 Example on Assembly language program

Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc

LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25

SP NOTES

14 NUMBER BLKW 115 SIX FILL x000616 17 END

Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode

The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7

The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary

LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10

BRp AGAIN

If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label

CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader

Pseudo-ops (Assembler Directives)

Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26

SP NOTES

ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory

Lecture_no16 surprise test

Lecture_no17 MODULE-2

AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source

Cross AssemblerThe assembler may run on one machine and produce object code for another

An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader

(Fig Assembler)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27

SP NOTES

For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program

(Fig Program Translation Steps)

Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer

Basic Assembler Functions

Convert mnemonic operation codes to machine language equivalents

Convert symbolic operands to machine addresses (pass 1)

Build machine instructions in the proper format

Convert data constants to internal (machine)representations

Error checking is provided

Write the object program and assembly listing files

Assembler directives

-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions

-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-

bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in

the program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28

SP NOTES

BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant

WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area

Lecture_no18 General design procedure of 360-type Assembler

Syntax Assembly language statements

Assembly language statements are entered one statement per line Each statement follows the following format

[label] mnemonic [operands] [comment]

The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command

An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)

Ex-

START LDAA 24 H

Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29

SP NOTES

on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character

Types of Assembly language statements

It consists of 3 typeseg

(i)Imperative statement

(ii)Declarative statements

(iii)Assembler directive statements

Advantages amp disadvantages of Assembly language

Lecture_no19 Design Specification of 1-pass assembler with example

Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task

Assembler Design can be done in

ndashSingle pass

ndashTwo pass

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30

Source program

Pass1

OPTAB SYMTAB

Intermediate file

Pass2

SYMTAB

Object codes

OPTAB

SP NOTES

The Intermediate file include each source statement assigned address and error indicator

Pass I

Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location

counter Build the symbol table (Table that is used to store each label and its corresponding value)

Pass II

Generate object code(perform actual translation)

One-Pass Assembler

The translation performed by an assembler is essentially a collection of substitutions machine operation code

for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines

bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31

LOCCTR

SP NOTES

(detailed pass1 assembler flowchart)

Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32

SP NOTES

Lecture_no20 Multi-pass assembler with example

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

(Detailed pass2 assembler flowchart)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Lecture_no5 Evolution of operating systemGeneral machine structure

OSJobmultiprogrammingMultitasking

FragmentationPagingScheduling

Facilities of computer os

Operating Systembull It is the collection of system programs which acts as an interface between user and the computer and computer hardware

bull The purpose of an operating system is to provide an environment in which A user can execute programs in a convenient manner

Functions of Operating System

bull File handling and management

bull Storage management (Memory management)

bull Device scheduling and management

bull CPU scheduling

bull Information management

bull Process control (management)

bull Error handling

bull Protecting itself from user amp protecting user from other users

What is MFT

What is MVT

What is Fragmentation

What is paging Explain its various types

What is scheduling

What is meant by the term time sharing

What is virtual memoryWhat are simple batched systems Explain

Give an evolution of an operating system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 8

SP NOTES

Lecture_no6 Description of Von-Neumann machines components

General machine structureVon-Neumannrsquos componentsBlock diagram and functions of each component of Von-neumannrsquos machine

1 The von Neumann Computer Modelbull Von Neumann computer systems contain three main building blocks-gt the central processing unit (CPU)-gt memory-gt and inputoutput devices (IO)bull These three components are connected together using the system busbull The most prominent items within the CPU are the registers they can be manipulated directly by acomputer program The following block diagram shows major relationship between CPU components

(Design of the VonNeumann architecture)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 9

SP NOTES

2 Components of the Von Neumann Model-gt Memory Storage of information (dataprogram)-gt Processing Unit ComputationProcessing of Information-gt Input Means of getting information into the computer eg keyboard mouse-gtOutput Means of getting information out of the computer eg printer monitor bull Control Unit Makes sure that all the other parts perform their tasks correctly and at the correct time The von Neumann Machine

(connection between CPU amp Main memory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 10

SP NOTES

MBR (Memory Buffer Register) = MBR contains a word to be stored in memory or is sued to receive from memory

MAR (Memory Address Register) = MAR specifies the address in memory of the word to be written from or read into the MBR

IR (Instruction Register) = IR contains the 8-bit op-code instruction being executedIBR (Instruction Buffer Register) = IBR is employed to hold temporarily the right-hand instructions

from a word in memoryPC (Program Counter) = PC contains the address of the next instruction-pair to be fetched from

memoryAC and MQ (Accumulator and Multiplier Quotient) = AC and MQ are employed to hold temporarily

operands and results of ALU operations

Fetch The process of retrieving a value from a memory location in order to put it in a register or pass it to a processing unit

IO controller A special-purpose processor that mediates between the main computer processor and a specific input or output device The processor records the request made by the main processor leaving it free to continue other tasks and interrupts the main processor when the input or output task is complete

Input Output The subsystem responsible for managing communications with external devices whether external memory storage other processors or devices for human interaction

Instruction set The set of codes that describe all the legal operations a particular computer can execute

Op code An unsigned binary integer that is assigned to a specific task the hardware can perform

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 11

SP NOTES

Register A special kind of very fast memory which is referred to by name rather than by a memory address number Registers often hold data values that are currently in use by the ALU or other subsystems of the architecture

Random access memory (RAM) The typical computer memory where each cell is addressed individually and may be updated or accessed in time that is independent of the cellrsquos location

Read-only memory (ROM) A memory that is usually very similar to RAM except it may only be accessed not updated

Store The process of writing a value to a memory cell

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 12

SP NOTES

Lecture_no7 Instruction format with microflowchart for ADD instruction

Flowchart for instruction fetchamp execution of general machine structure

Typical operating system Information flow in microflowchart for ADDSUBMULT

Memory operations

FETCH (addr)

1Put addr into MAR2Tell memory unit to ldquoloadrdquo3Memory copies data into MDR

STORE(addrnew-value)

1 Put addr into MAR2 Put new-value into MDR3 Tell memory unit to ldquostorerdquo4 Memory stores data from MDR into memory cell

Problem 1)

a) Write an assembly program for a GPR machine having 8 general purpose registers (R1R7) to evaluate the following expression Ignore the remainder part of the division process

( (a + b) (a + d) ) (a e)

where abde are variables with addresses ABDE respectively and each operation is defined as follows + --------gt ADD Ri Rj ( Rj lt- Ri + Rj) --------gt MULT Ri Rj ( Rj lt- Ri Rj) --------gt DIV Ri Rj ( Rj lt- Rj Ri)

a = M[A] ------gt content of memory location at address A same for B D E

move A D0 D0 = amove B D4 D4 = badd D0 D4 D4 = a + bmove D D1 D1 = dadd D0 D1 D1 = a + ddiv D1D4 D4 = (a + b) (a + d)move E D2 D2 = emult D0 D2 D2 = (a e)mult D4 D2 D2 = ( (a + b) (a + d) ) (a e)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 13

SP NOTES

Problem 2)

a) Provide micro-instructions in the form of detailed flowchart for the execution of the following instructions

b) Show the flow of the micro-instructions within the GPR architecture by highlighting the system buses and the registers used

i)move X D0 move contents of memory location X to register D0ii)move D0 X move contents of register D0 to memory location Xiii)movea 22(a3)25(a1) move contents from memory to memoryiv)bgtw label branch if greater than to location ldquolabelrdquo

Answer 2)

For each of the above for part b) draw the CPU consisting of all the registers (for example MAR MBR PC IR Address and Data Registers ALU) and Memory and show each of the micro-instruction as in part a)

i) move X D0Machine Instruction

0011 0000 0011 1000

Address of X

a) Micro-steps are

MAR lt-- PC

MBR lt-- M[MAR] fetch

IR lt-- MBR

| decode |PC lt-- PC + 2 PC points to source addressMAR lt-- PCMBR lt-- M[MAR] address of X in MBRMAR MBR move X to MARMBR M[MAR] read contents of location XD0 MBR PC lt-- PC + 2 PC points to next instruction

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 14

SP NOTES

ii) move D0 X

Machine Instruction

0011 1110 0000 0000

Address of X

a) Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to destination addressMAR lt-- PCMBR M[MAR] read destination address XMAR MBR MAR has X now

MBR D0 source data in MBR

M[MAR] MBR data in Effective address X (which is in MAR)

PC lt-- PC + 2 PC points to next instruction

iii) movea 22(a3)25(a1)

Machine Instruction

0011 0011 0110 1011

0000 0000 0001 0110

0000 0000 0001 1001

a)Source Effective address = contents of a3 + $16temp lt--- M[a3 + $16]

Destination Effective address = contents of a1 + $19M[a1 + $19] lt- temp

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 15

SP NOTES

Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to source displacementMAR lt-- PCMBR lt-- M[MAR] source displacement loadedMAR lt-- [A3 + MBR] source effective address calculatedMBR lt- M[MAR] source data in MBRWR lt- MBR saved temporarily since MBR is to be reusedPC lt-- PC + 2 PC points to destination displacementMAR lt-- PCMBR lt-- M[MAR] destination displacement loaded

MAR lt-- [A1 + MBR] destination effective address calculatedMBR lt- WR source data back to MBR

M[MAR] lt-- MBR source data moved to destination

PC lt-- PC + 2 PC points to next instruction

iv) bgtw label

Machine Instruction

0110 1110 0000 0000 ---- displacement ----

a)MAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | interpret decode |

TRUE FALSE | |

PC lt-- PC + 2 PC lt- PC + 4 MAR lt-- PC MBR lt-- M[MAR] get displacement PC lt-- PC + MBR execute

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 16

SP NOTES

Lecture_no8 Memory unit Register Data of IBM 360

General machine structure of IBM 360370

The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific

(i)Memory unit

Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword

The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory

(ii)Register-

Registers are areas of storage that are set aside in the processing unit

There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)

Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions

(iii)Data

Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits

followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity

Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign

Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17

SP NOTES

three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output

Lecture_no9 Various instruction format of IBM 360370

( iv)Instruction

Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats

RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R

Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field

RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location

Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement

RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces

Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement

SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data

Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement

SS (six bytes) S tells that that there is data in storage Here both operands have data in storage

Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1

Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18

SP NOTES

Addressing Mode

1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D

4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19

SP NOTES

Lecture_no10 Machine language Long way no looping method

Elementary Instruction Set

Typical Data Transfer Instructions

Typical Arithmetic Instructions

Typical Logical and Bit Manipulation Instr

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20

SP NOTES

Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y

MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT

Lecture_no11 Address modification using instruction as data ampusing index register

Lecture_no12 Looping method with example

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21

SP NOTES

Lecture_no13 Assembly language machine language

Assembly LanguageAssembly instructions are just shorthand for machine instructions

Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22

SP NOTES

Very easy to write an Assembly Language -gtMachine language translator

Exercise

What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution

STORE 1 R1MOVE R1 R2LOAD R2 1HALT

Machine Language

The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer

Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer

10000001001001011000000101000101101000010000011010000010000001101111111111111111

LOAD contents of memorylocation into register

100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011

STORE contents of registerinto memory location

100000100 RR MMMMMex Mem[4] = R0100000100 00 00100

MOVE contents of oneregister into another register

100100010000 RR RRex R0 = R1100100010000 00 01

ADD contents of 2 registersstore result in third

1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10

SUBTRACT contents of 2registers store result into third

1010001000 RR RR RRex R0 = R1 ndash R2

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23

SP NOTES

1010001000 00 01 10

Halt the program 1111111111111111

Lecture_no14 Meaning of Assembly language program types Literals

bullUSING

bullBALR

bullSTART

bullEND

bullBR

bullLiteral

bullLTORG

bullBCT

bullEQU ExampleSTART EQU 10

The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write

START EQU 10ORG STARTGOTO 10

bullORG

The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored

Example

ORG 0GOTO 10

The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000

But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24

SP NOTES

bullpseudo-ops vs machine-ops

Lecture_no15 Example on Assembly language program

Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc

LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25

SP NOTES

14 NUMBER BLKW 115 SIX FILL x000616 17 END

Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode

The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7

The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary

LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10

BRp AGAIN

If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label

CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader

Pseudo-ops (Assembler Directives)

Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26

SP NOTES

ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory

Lecture_no16 surprise test

Lecture_no17 MODULE-2

AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source

Cross AssemblerThe assembler may run on one machine and produce object code for another

An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader

(Fig Assembler)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27

SP NOTES

For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program

(Fig Program Translation Steps)

Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer

Basic Assembler Functions

Convert mnemonic operation codes to machine language equivalents

Convert symbolic operands to machine addresses (pass 1)

Build machine instructions in the proper format

Convert data constants to internal (machine)representations

Error checking is provided

Write the object program and assembly listing files

Assembler directives

-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions

-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-

bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in

the program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28

SP NOTES

BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant

WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area

Lecture_no18 General design procedure of 360-type Assembler

Syntax Assembly language statements

Assembly language statements are entered one statement per line Each statement follows the following format

[label] mnemonic [operands] [comment]

The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command

An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)

Ex-

START LDAA 24 H

Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29

SP NOTES

on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character

Types of Assembly language statements

It consists of 3 typeseg

(i)Imperative statement

(ii)Declarative statements

(iii)Assembler directive statements

Advantages amp disadvantages of Assembly language

Lecture_no19 Design Specification of 1-pass assembler with example

Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task

Assembler Design can be done in

ndashSingle pass

ndashTwo pass

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30

Source program

Pass1

OPTAB SYMTAB

Intermediate file

Pass2

SYMTAB

Object codes

OPTAB

SP NOTES

The Intermediate file include each source statement assigned address and error indicator

Pass I

Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location

counter Build the symbol table (Table that is used to store each label and its corresponding value)

Pass II

Generate object code(perform actual translation)

One-Pass Assembler

The translation performed by an assembler is essentially a collection of substitutions machine operation code

for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines

bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31

LOCCTR

SP NOTES

(detailed pass1 assembler flowchart)

Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32

SP NOTES

Lecture_no20 Multi-pass assembler with example

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

(Detailed pass2 assembler flowchart)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Lecture_no6 Description of Von-Neumann machines components

General machine structureVon-Neumannrsquos componentsBlock diagram and functions of each component of Von-neumannrsquos machine

1 The von Neumann Computer Modelbull Von Neumann computer systems contain three main building blocks-gt the central processing unit (CPU)-gt memory-gt and inputoutput devices (IO)bull These three components are connected together using the system busbull The most prominent items within the CPU are the registers they can be manipulated directly by acomputer program The following block diagram shows major relationship between CPU components

(Design of the VonNeumann architecture)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 9

SP NOTES

2 Components of the Von Neumann Model-gt Memory Storage of information (dataprogram)-gt Processing Unit ComputationProcessing of Information-gt Input Means of getting information into the computer eg keyboard mouse-gtOutput Means of getting information out of the computer eg printer monitor bull Control Unit Makes sure that all the other parts perform their tasks correctly and at the correct time The von Neumann Machine

(connection between CPU amp Main memory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 10

SP NOTES

MBR (Memory Buffer Register) = MBR contains a word to be stored in memory or is sued to receive from memory

MAR (Memory Address Register) = MAR specifies the address in memory of the word to be written from or read into the MBR

IR (Instruction Register) = IR contains the 8-bit op-code instruction being executedIBR (Instruction Buffer Register) = IBR is employed to hold temporarily the right-hand instructions

from a word in memoryPC (Program Counter) = PC contains the address of the next instruction-pair to be fetched from

memoryAC and MQ (Accumulator and Multiplier Quotient) = AC and MQ are employed to hold temporarily

operands and results of ALU operations

Fetch The process of retrieving a value from a memory location in order to put it in a register or pass it to a processing unit

IO controller A special-purpose processor that mediates between the main computer processor and a specific input or output device The processor records the request made by the main processor leaving it free to continue other tasks and interrupts the main processor when the input or output task is complete

Input Output The subsystem responsible for managing communications with external devices whether external memory storage other processors or devices for human interaction

Instruction set The set of codes that describe all the legal operations a particular computer can execute

Op code An unsigned binary integer that is assigned to a specific task the hardware can perform

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 11

SP NOTES

Register A special kind of very fast memory which is referred to by name rather than by a memory address number Registers often hold data values that are currently in use by the ALU or other subsystems of the architecture

Random access memory (RAM) The typical computer memory where each cell is addressed individually and may be updated or accessed in time that is independent of the cellrsquos location

Read-only memory (ROM) A memory that is usually very similar to RAM except it may only be accessed not updated

Store The process of writing a value to a memory cell

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 12

SP NOTES

Lecture_no7 Instruction format with microflowchart for ADD instruction

Flowchart for instruction fetchamp execution of general machine structure

Typical operating system Information flow in microflowchart for ADDSUBMULT

Memory operations

FETCH (addr)

1Put addr into MAR2Tell memory unit to ldquoloadrdquo3Memory copies data into MDR

STORE(addrnew-value)

1 Put addr into MAR2 Put new-value into MDR3 Tell memory unit to ldquostorerdquo4 Memory stores data from MDR into memory cell

Problem 1)

a) Write an assembly program for a GPR machine having 8 general purpose registers (R1R7) to evaluate the following expression Ignore the remainder part of the division process

( (a + b) (a + d) ) (a e)

where abde are variables with addresses ABDE respectively and each operation is defined as follows + --------gt ADD Ri Rj ( Rj lt- Ri + Rj) --------gt MULT Ri Rj ( Rj lt- Ri Rj) --------gt DIV Ri Rj ( Rj lt- Rj Ri)

a = M[A] ------gt content of memory location at address A same for B D E

move A D0 D0 = amove B D4 D4 = badd D0 D4 D4 = a + bmove D D1 D1 = dadd D0 D1 D1 = a + ddiv D1D4 D4 = (a + b) (a + d)move E D2 D2 = emult D0 D2 D2 = (a e)mult D4 D2 D2 = ( (a + b) (a + d) ) (a e)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 13

SP NOTES

Problem 2)

a) Provide micro-instructions in the form of detailed flowchart for the execution of the following instructions

b) Show the flow of the micro-instructions within the GPR architecture by highlighting the system buses and the registers used

i)move X D0 move contents of memory location X to register D0ii)move D0 X move contents of register D0 to memory location Xiii)movea 22(a3)25(a1) move contents from memory to memoryiv)bgtw label branch if greater than to location ldquolabelrdquo

Answer 2)

For each of the above for part b) draw the CPU consisting of all the registers (for example MAR MBR PC IR Address and Data Registers ALU) and Memory and show each of the micro-instruction as in part a)

i) move X D0Machine Instruction

0011 0000 0011 1000

Address of X

a) Micro-steps are

MAR lt-- PC

MBR lt-- M[MAR] fetch

IR lt-- MBR

| decode |PC lt-- PC + 2 PC points to source addressMAR lt-- PCMBR lt-- M[MAR] address of X in MBRMAR MBR move X to MARMBR M[MAR] read contents of location XD0 MBR PC lt-- PC + 2 PC points to next instruction

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 14

SP NOTES

ii) move D0 X

Machine Instruction

0011 1110 0000 0000

Address of X

a) Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to destination addressMAR lt-- PCMBR M[MAR] read destination address XMAR MBR MAR has X now

MBR D0 source data in MBR

M[MAR] MBR data in Effective address X (which is in MAR)

PC lt-- PC + 2 PC points to next instruction

iii) movea 22(a3)25(a1)

Machine Instruction

0011 0011 0110 1011

0000 0000 0001 0110

0000 0000 0001 1001

a)Source Effective address = contents of a3 + $16temp lt--- M[a3 + $16]

Destination Effective address = contents of a1 + $19M[a1 + $19] lt- temp

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 15

SP NOTES

Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to source displacementMAR lt-- PCMBR lt-- M[MAR] source displacement loadedMAR lt-- [A3 + MBR] source effective address calculatedMBR lt- M[MAR] source data in MBRWR lt- MBR saved temporarily since MBR is to be reusedPC lt-- PC + 2 PC points to destination displacementMAR lt-- PCMBR lt-- M[MAR] destination displacement loaded

MAR lt-- [A1 + MBR] destination effective address calculatedMBR lt- WR source data back to MBR

M[MAR] lt-- MBR source data moved to destination

PC lt-- PC + 2 PC points to next instruction

iv) bgtw label

Machine Instruction

0110 1110 0000 0000 ---- displacement ----

a)MAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | interpret decode |

TRUE FALSE | |

PC lt-- PC + 2 PC lt- PC + 4 MAR lt-- PC MBR lt-- M[MAR] get displacement PC lt-- PC + MBR execute

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 16

SP NOTES

Lecture_no8 Memory unit Register Data of IBM 360

General machine structure of IBM 360370

The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific

(i)Memory unit

Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword

The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory

(ii)Register-

Registers are areas of storage that are set aside in the processing unit

There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)

Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions

(iii)Data

Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits

followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity

Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign

Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17

SP NOTES

three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output

Lecture_no9 Various instruction format of IBM 360370

( iv)Instruction

Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats

RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R

Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field

RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location

Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement

RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces

Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement

SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data

Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement

SS (six bytes) S tells that that there is data in storage Here both operands have data in storage

Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1

Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18

SP NOTES

Addressing Mode

1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D

4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19

SP NOTES

Lecture_no10 Machine language Long way no looping method

Elementary Instruction Set

Typical Data Transfer Instructions

Typical Arithmetic Instructions

Typical Logical and Bit Manipulation Instr

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20

SP NOTES

Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y

MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT

Lecture_no11 Address modification using instruction as data ampusing index register

Lecture_no12 Looping method with example

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21

SP NOTES

Lecture_no13 Assembly language machine language

Assembly LanguageAssembly instructions are just shorthand for machine instructions

Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22

SP NOTES

Very easy to write an Assembly Language -gtMachine language translator

Exercise

What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution

STORE 1 R1MOVE R1 R2LOAD R2 1HALT

Machine Language

The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer

Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer

10000001001001011000000101000101101000010000011010000010000001101111111111111111

LOAD contents of memorylocation into register

100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011

STORE contents of registerinto memory location

100000100 RR MMMMMex Mem[4] = R0100000100 00 00100

MOVE contents of oneregister into another register

100100010000 RR RRex R0 = R1100100010000 00 01

ADD contents of 2 registersstore result in third

1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10

SUBTRACT contents of 2registers store result into third

1010001000 RR RR RRex R0 = R1 ndash R2

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23

SP NOTES

1010001000 00 01 10

Halt the program 1111111111111111

Lecture_no14 Meaning of Assembly language program types Literals

bullUSING

bullBALR

bullSTART

bullEND

bullBR

bullLiteral

bullLTORG

bullBCT

bullEQU ExampleSTART EQU 10

The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write

START EQU 10ORG STARTGOTO 10

bullORG

The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored

Example

ORG 0GOTO 10

The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000

But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24

SP NOTES

bullpseudo-ops vs machine-ops

Lecture_no15 Example on Assembly language program

Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc

LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25

SP NOTES

14 NUMBER BLKW 115 SIX FILL x000616 17 END

Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode

The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7

The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary

LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10

BRp AGAIN

If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label

CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader

Pseudo-ops (Assembler Directives)

Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26

SP NOTES

ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory

Lecture_no16 surprise test

Lecture_no17 MODULE-2

AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source

Cross AssemblerThe assembler may run on one machine and produce object code for another

An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader

(Fig Assembler)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27

SP NOTES

For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program

(Fig Program Translation Steps)

Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer

Basic Assembler Functions

Convert mnemonic operation codes to machine language equivalents

Convert symbolic operands to machine addresses (pass 1)

Build machine instructions in the proper format

Convert data constants to internal (machine)representations

Error checking is provided

Write the object program and assembly listing files

Assembler directives

-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions

-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-

bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in

the program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28

SP NOTES

BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant

WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area

Lecture_no18 General design procedure of 360-type Assembler

Syntax Assembly language statements

Assembly language statements are entered one statement per line Each statement follows the following format

[label] mnemonic [operands] [comment]

The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command

An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)

Ex-

START LDAA 24 H

Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29

SP NOTES

on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character

Types of Assembly language statements

It consists of 3 typeseg

(i)Imperative statement

(ii)Declarative statements

(iii)Assembler directive statements

Advantages amp disadvantages of Assembly language

Lecture_no19 Design Specification of 1-pass assembler with example

Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task

Assembler Design can be done in

ndashSingle pass

ndashTwo pass

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30

Source program

Pass1

OPTAB SYMTAB

Intermediate file

Pass2

SYMTAB

Object codes

OPTAB

SP NOTES

The Intermediate file include each source statement assigned address and error indicator

Pass I

Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location

counter Build the symbol table (Table that is used to store each label and its corresponding value)

Pass II

Generate object code(perform actual translation)

One-Pass Assembler

The translation performed by an assembler is essentially a collection of substitutions machine operation code

for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines

bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31

LOCCTR

SP NOTES

(detailed pass1 assembler flowchart)

Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32

SP NOTES

Lecture_no20 Multi-pass assembler with example

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

(Detailed pass2 assembler flowchart)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

2 Components of the Von Neumann Model-gt Memory Storage of information (dataprogram)-gt Processing Unit ComputationProcessing of Information-gt Input Means of getting information into the computer eg keyboard mouse-gtOutput Means of getting information out of the computer eg printer monitor bull Control Unit Makes sure that all the other parts perform their tasks correctly and at the correct time The von Neumann Machine

(connection between CPU amp Main memory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 10

SP NOTES

MBR (Memory Buffer Register) = MBR contains a word to be stored in memory or is sued to receive from memory

MAR (Memory Address Register) = MAR specifies the address in memory of the word to be written from or read into the MBR

IR (Instruction Register) = IR contains the 8-bit op-code instruction being executedIBR (Instruction Buffer Register) = IBR is employed to hold temporarily the right-hand instructions

from a word in memoryPC (Program Counter) = PC contains the address of the next instruction-pair to be fetched from

memoryAC and MQ (Accumulator and Multiplier Quotient) = AC and MQ are employed to hold temporarily

operands and results of ALU operations

Fetch The process of retrieving a value from a memory location in order to put it in a register or pass it to a processing unit

IO controller A special-purpose processor that mediates between the main computer processor and a specific input or output device The processor records the request made by the main processor leaving it free to continue other tasks and interrupts the main processor when the input or output task is complete

Input Output The subsystem responsible for managing communications with external devices whether external memory storage other processors or devices for human interaction

Instruction set The set of codes that describe all the legal operations a particular computer can execute

Op code An unsigned binary integer that is assigned to a specific task the hardware can perform

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 11

SP NOTES

Register A special kind of very fast memory which is referred to by name rather than by a memory address number Registers often hold data values that are currently in use by the ALU or other subsystems of the architecture

Random access memory (RAM) The typical computer memory where each cell is addressed individually and may be updated or accessed in time that is independent of the cellrsquos location

Read-only memory (ROM) A memory that is usually very similar to RAM except it may only be accessed not updated

Store The process of writing a value to a memory cell

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 12

SP NOTES

Lecture_no7 Instruction format with microflowchart for ADD instruction

Flowchart for instruction fetchamp execution of general machine structure

Typical operating system Information flow in microflowchart for ADDSUBMULT

Memory operations

FETCH (addr)

1Put addr into MAR2Tell memory unit to ldquoloadrdquo3Memory copies data into MDR

STORE(addrnew-value)

1 Put addr into MAR2 Put new-value into MDR3 Tell memory unit to ldquostorerdquo4 Memory stores data from MDR into memory cell

Problem 1)

a) Write an assembly program for a GPR machine having 8 general purpose registers (R1R7) to evaluate the following expression Ignore the remainder part of the division process

( (a + b) (a + d) ) (a e)

where abde are variables with addresses ABDE respectively and each operation is defined as follows + --------gt ADD Ri Rj ( Rj lt- Ri + Rj) --------gt MULT Ri Rj ( Rj lt- Ri Rj) --------gt DIV Ri Rj ( Rj lt- Rj Ri)

a = M[A] ------gt content of memory location at address A same for B D E

move A D0 D0 = amove B D4 D4 = badd D0 D4 D4 = a + bmove D D1 D1 = dadd D0 D1 D1 = a + ddiv D1D4 D4 = (a + b) (a + d)move E D2 D2 = emult D0 D2 D2 = (a e)mult D4 D2 D2 = ( (a + b) (a + d) ) (a e)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 13

SP NOTES

Problem 2)

a) Provide micro-instructions in the form of detailed flowchart for the execution of the following instructions

b) Show the flow of the micro-instructions within the GPR architecture by highlighting the system buses and the registers used

i)move X D0 move contents of memory location X to register D0ii)move D0 X move contents of register D0 to memory location Xiii)movea 22(a3)25(a1) move contents from memory to memoryiv)bgtw label branch if greater than to location ldquolabelrdquo

Answer 2)

For each of the above for part b) draw the CPU consisting of all the registers (for example MAR MBR PC IR Address and Data Registers ALU) and Memory and show each of the micro-instruction as in part a)

i) move X D0Machine Instruction

0011 0000 0011 1000

Address of X

a) Micro-steps are

MAR lt-- PC

MBR lt-- M[MAR] fetch

IR lt-- MBR

| decode |PC lt-- PC + 2 PC points to source addressMAR lt-- PCMBR lt-- M[MAR] address of X in MBRMAR MBR move X to MARMBR M[MAR] read contents of location XD0 MBR PC lt-- PC + 2 PC points to next instruction

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 14

SP NOTES

ii) move D0 X

Machine Instruction

0011 1110 0000 0000

Address of X

a) Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to destination addressMAR lt-- PCMBR M[MAR] read destination address XMAR MBR MAR has X now

MBR D0 source data in MBR

M[MAR] MBR data in Effective address X (which is in MAR)

PC lt-- PC + 2 PC points to next instruction

iii) movea 22(a3)25(a1)

Machine Instruction

0011 0011 0110 1011

0000 0000 0001 0110

0000 0000 0001 1001

a)Source Effective address = contents of a3 + $16temp lt--- M[a3 + $16]

Destination Effective address = contents of a1 + $19M[a1 + $19] lt- temp

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 15

SP NOTES

Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to source displacementMAR lt-- PCMBR lt-- M[MAR] source displacement loadedMAR lt-- [A3 + MBR] source effective address calculatedMBR lt- M[MAR] source data in MBRWR lt- MBR saved temporarily since MBR is to be reusedPC lt-- PC + 2 PC points to destination displacementMAR lt-- PCMBR lt-- M[MAR] destination displacement loaded

MAR lt-- [A1 + MBR] destination effective address calculatedMBR lt- WR source data back to MBR

M[MAR] lt-- MBR source data moved to destination

PC lt-- PC + 2 PC points to next instruction

iv) bgtw label

Machine Instruction

0110 1110 0000 0000 ---- displacement ----

a)MAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | interpret decode |

TRUE FALSE | |

PC lt-- PC + 2 PC lt- PC + 4 MAR lt-- PC MBR lt-- M[MAR] get displacement PC lt-- PC + MBR execute

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 16

SP NOTES

Lecture_no8 Memory unit Register Data of IBM 360

General machine structure of IBM 360370

The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific

(i)Memory unit

Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword

The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory

(ii)Register-

Registers are areas of storage that are set aside in the processing unit

There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)

Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions

(iii)Data

Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits

followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity

Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign

Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17

SP NOTES

three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output

Lecture_no9 Various instruction format of IBM 360370

( iv)Instruction

Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats

RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R

Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field

RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location

Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement

RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces

Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement

SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data

Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement

SS (six bytes) S tells that that there is data in storage Here both operands have data in storage

Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1

Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18

SP NOTES

Addressing Mode

1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D

4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19

SP NOTES

Lecture_no10 Machine language Long way no looping method

Elementary Instruction Set

Typical Data Transfer Instructions

Typical Arithmetic Instructions

Typical Logical and Bit Manipulation Instr

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20

SP NOTES

Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y

MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT

Lecture_no11 Address modification using instruction as data ampusing index register

Lecture_no12 Looping method with example

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21

SP NOTES

Lecture_no13 Assembly language machine language

Assembly LanguageAssembly instructions are just shorthand for machine instructions

Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22

SP NOTES

Very easy to write an Assembly Language -gtMachine language translator

Exercise

What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution

STORE 1 R1MOVE R1 R2LOAD R2 1HALT

Machine Language

The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer

Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer

10000001001001011000000101000101101000010000011010000010000001101111111111111111

LOAD contents of memorylocation into register

100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011

STORE contents of registerinto memory location

100000100 RR MMMMMex Mem[4] = R0100000100 00 00100

MOVE contents of oneregister into another register

100100010000 RR RRex R0 = R1100100010000 00 01

ADD contents of 2 registersstore result in third

1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10

SUBTRACT contents of 2registers store result into third

1010001000 RR RR RRex R0 = R1 ndash R2

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23

SP NOTES

1010001000 00 01 10

Halt the program 1111111111111111

Lecture_no14 Meaning of Assembly language program types Literals

bullUSING

bullBALR

bullSTART

bullEND

bullBR

bullLiteral

bullLTORG

bullBCT

bullEQU ExampleSTART EQU 10

The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write

START EQU 10ORG STARTGOTO 10

bullORG

The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored

Example

ORG 0GOTO 10

The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000

But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24

SP NOTES

bullpseudo-ops vs machine-ops

Lecture_no15 Example on Assembly language program

Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc

LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25

SP NOTES

14 NUMBER BLKW 115 SIX FILL x000616 17 END

Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode

The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7

The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary

LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10

BRp AGAIN

If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label

CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader

Pseudo-ops (Assembler Directives)

Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26

SP NOTES

ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory

Lecture_no16 surprise test

Lecture_no17 MODULE-2

AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source

Cross AssemblerThe assembler may run on one machine and produce object code for another

An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader

(Fig Assembler)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27

SP NOTES

For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program

(Fig Program Translation Steps)

Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer

Basic Assembler Functions

Convert mnemonic operation codes to machine language equivalents

Convert symbolic operands to machine addresses (pass 1)

Build machine instructions in the proper format

Convert data constants to internal (machine)representations

Error checking is provided

Write the object program and assembly listing files

Assembler directives

-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions

-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-

bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in

the program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28

SP NOTES

BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant

WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area

Lecture_no18 General design procedure of 360-type Assembler

Syntax Assembly language statements

Assembly language statements are entered one statement per line Each statement follows the following format

[label] mnemonic [operands] [comment]

The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command

An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)

Ex-

START LDAA 24 H

Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29

SP NOTES

on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character

Types of Assembly language statements

It consists of 3 typeseg

(i)Imperative statement

(ii)Declarative statements

(iii)Assembler directive statements

Advantages amp disadvantages of Assembly language

Lecture_no19 Design Specification of 1-pass assembler with example

Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task

Assembler Design can be done in

ndashSingle pass

ndashTwo pass

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30

Source program

Pass1

OPTAB SYMTAB

Intermediate file

Pass2

SYMTAB

Object codes

OPTAB

SP NOTES

The Intermediate file include each source statement assigned address and error indicator

Pass I

Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location

counter Build the symbol table (Table that is used to store each label and its corresponding value)

Pass II

Generate object code(perform actual translation)

One-Pass Assembler

The translation performed by an assembler is essentially a collection of substitutions machine operation code

for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines

bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31

LOCCTR

SP NOTES

(detailed pass1 assembler flowchart)

Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32

SP NOTES

Lecture_no20 Multi-pass assembler with example

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

(Detailed pass2 assembler flowchart)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

MBR (Memory Buffer Register) = MBR contains a word to be stored in memory or is sued to receive from memory

MAR (Memory Address Register) = MAR specifies the address in memory of the word to be written from or read into the MBR

IR (Instruction Register) = IR contains the 8-bit op-code instruction being executedIBR (Instruction Buffer Register) = IBR is employed to hold temporarily the right-hand instructions

from a word in memoryPC (Program Counter) = PC contains the address of the next instruction-pair to be fetched from

memoryAC and MQ (Accumulator and Multiplier Quotient) = AC and MQ are employed to hold temporarily

operands and results of ALU operations

Fetch The process of retrieving a value from a memory location in order to put it in a register or pass it to a processing unit

IO controller A special-purpose processor that mediates between the main computer processor and a specific input or output device The processor records the request made by the main processor leaving it free to continue other tasks and interrupts the main processor when the input or output task is complete

Input Output The subsystem responsible for managing communications with external devices whether external memory storage other processors or devices for human interaction

Instruction set The set of codes that describe all the legal operations a particular computer can execute

Op code An unsigned binary integer that is assigned to a specific task the hardware can perform

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 11

SP NOTES

Register A special kind of very fast memory which is referred to by name rather than by a memory address number Registers often hold data values that are currently in use by the ALU or other subsystems of the architecture

Random access memory (RAM) The typical computer memory where each cell is addressed individually and may be updated or accessed in time that is independent of the cellrsquos location

Read-only memory (ROM) A memory that is usually very similar to RAM except it may only be accessed not updated

Store The process of writing a value to a memory cell

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 12

SP NOTES

Lecture_no7 Instruction format with microflowchart for ADD instruction

Flowchart for instruction fetchamp execution of general machine structure

Typical operating system Information flow in microflowchart for ADDSUBMULT

Memory operations

FETCH (addr)

1Put addr into MAR2Tell memory unit to ldquoloadrdquo3Memory copies data into MDR

STORE(addrnew-value)

1 Put addr into MAR2 Put new-value into MDR3 Tell memory unit to ldquostorerdquo4 Memory stores data from MDR into memory cell

Problem 1)

a) Write an assembly program for a GPR machine having 8 general purpose registers (R1R7) to evaluate the following expression Ignore the remainder part of the division process

( (a + b) (a + d) ) (a e)

where abde are variables with addresses ABDE respectively and each operation is defined as follows + --------gt ADD Ri Rj ( Rj lt- Ri + Rj) --------gt MULT Ri Rj ( Rj lt- Ri Rj) --------gt DIV Ri Rj ( Rj lt- Rj Ri)

a = M[A] ------gt content of memory location at address A same for B D E

move A D0 D0 = amove B D4 D4 = badd D0 D4 D4 = a + bmove D D1 D1 = dadd D0 D1 D1 = a + ddiv D1D4 D4 = (a + b) (a + d)move E D2 D2 = emult D0 D2 D2 = (a e)mult D4 D2 D2 = ( (a + b) (a + d) ) (a e)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 13

SP NOTES

Problem 2)

a) Provide micro-instructions in the form of detailed flowchart for the execution of the following instructions

b) Show the flow of the micro-instructions within the GPR architecture by highlighting the system buses and the registers used

i)move X D0 move contents of memory location X to register D0ii)move D0 X move contents of register D0 to memory location Xiii)movea 22(a3)25(a1) move contents from memory to memoryiv)bgtw label branch if greater than to location ldquolabelrdquo

Answer 2)

For each of the above for part b) draw the CPU consisting of all the registers (for example MAR MBR PC IR Address and Data Registers ALU) and Memory and show each of the micro-instruction as in part a)

i) move X D0Machine Instruction

0011 0000 0011 1000

Address of X

a) Micro-steps are

MAR lt-- PC

MBR lt-- M[MAR] fetch

IR lt-- MBR

| decode |PC lt-- PC + 2 PC points to source addressMAR lt-- PCMBR lt-- M[MAR] address of X in MBRMAR MBR move X to MARMBR M[MAR] read contents of location XD0 MBR PC lt-- PC + 2 PC points to next instruction

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 14

SP NOTES

ii) move D0 X

Machine Instruction

0011 1110 0000 0000

Address of X

a) Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to destination addressMAR lt-- PCMBR M[MAR] read destination address XMAR MBR MAR has X now

MBR D0 source data in MBR

M[MAR] MBR data in Effective address X (which is in MAR)

PC lt-- PC + 2 PC points to next instruction

iii) movea 22(a3)25(a1)

Machine Instruction

0011 0011 0110 1011

0000 0000 0001 0110

0000 0000 0001 1001

a)Source Effective address = contents of a3 + $16temp lt--- M[a3 + $16]

Destination Effective address = contents of a1 + $19M[a1 + $19] lt- temp

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 15

SP NOTES

Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to source displacementMAR lt-- PCMBR lt-- M[MAR] source displacement loadedMAR lt-- [A3 + MBR] source effective address calculatedMBR lt- M[MAR] source data in MBRWR lt- MBR saved temporarily since MBR is to be reusedPC lt-- PC + 2 PC points to destination displacementMAR lt-- PCMBR lt-- M[MAR] destination displacement loaded

MAR lt-- [A1 + MBR] destination effective address calculatedMBR lt- WR source data back to MBR

M[MAR] lt-- MBR source data moved to destination

PC lt-- PC + 2 PC points to next instruction

iv) bgtw label

Machine Instruction

0110 1110 0000 0000 ---- displacement ----

a)MAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | interpret decode |

TRUE FALSE | |

PC lt-- PC + 2 PC lt- PC + 4 MAR lt-- PC MBR lt-- M[MAR] get displacement PC lt-- PC + MBR execute

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 16

SP NOTES

Lecture_no8 Memory unit Register Data of IBM 360

General machine structure of IBM 360370

The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific

(i)Memory unit

Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword

The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory

(ii)Register-

Registers are areas of storage that are set aside in the processing unit

There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)

Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions

(iii)Data

Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits

followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity

Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign

Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17

SP NOTES

three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output

Lecture_no9 Various instruction format of IBM 360370

( iv)Instruction

Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats

RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R

Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field

RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location

Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement

RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces

Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement

SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data

Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement

SS (six bytes) S tells that that there is data in storage Here both operands have data in storage

Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1

Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18

SP NOTES

Addressing Mode

1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D

4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19

SP NOTES

Lecture_no10 Machine language Long way no looping method

Elementary Instruction Set

Typical Data Transfer Instructions

Typical Arithmetic Instructions

Typical Logical and Bit Manipulation Instr

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20

SP NOTES

Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y

MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT

Lecture_no11 Address modification using instruction as data ampusing index register

Lecture_no12 Looping method with example

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21

SP NOTES

Lecture_no13 Assembly language machine language

Assembly LanguageAssembly instructions are just shorthand for machine instructions

Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22

SP NOTES

Very easy to write an Assembly Language -gtMachine language translator

Exercise

What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution

STORE 1 R1MOVE R1 R2LOAD R2 1HALT

Machine Language

The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer

Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer

10000001001001011000000101000101101000010000011010000010000001101111111111111111

LOAD contents of memorylocation into register

100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011

STORE contents of registerinto memory location

100000100 RR MMMMMex Mem[4] = R0100000100 00 00100

MOVE contents of oneregister into another register

100100010000 RR RRex R0 = R1100100010000 00 01

ADD contents of 2 registersstore result in third

1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10

SUBTRACT contents of 2registers store result into third

1010001000 RR RR RRex R0 = R1 ndash R2

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23

SP NOTES

1010001000 00 01 10

Halt the program 1111111111111111

Lecture_no14 Meaning of Assembly language program types Literals

bullUSING

bullBALR

bullSTART

bullEND

bullBR

bullLiteral

bullLTORG

bullBCT

bullEQU ExampleSTART EQU 10

The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write

START EQU 10ORG STARTGOTO 10

bullORG

The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored

Example

ORG 0GOTO 10

The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000

But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24

SP NOTES

bullpseudo-ops vs machine-ops

Lecture_no15 Example on Assembly language program

Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc

LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25

SP NOTES

14 NUMBER BLKW 115 SIX FILL x000616 17 END

Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode

The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7

The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary

LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10

BRp AGAIN

If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label

CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader

Pseudo-ops (Assembler Directives)

Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26

SP NOTES

ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory

Lecture_no16 surprise test

Lecture_no17 MODULE-2

AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source

Cross AssemblerThe assembler may run on one machine and produce object code for another

An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader

(Fig Assembler)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27

SP NOTES

For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program

(Fig Program Translation Steps)

Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer

Basic Assembler Functions

Convert mnemonic operation codes to machine language equivalents

Convert symbolic operands to machine addresses (pass 1)

Build machine instructions in the proper format

Convert data constants to internal (machine)representations

Error checking is provided

Write the object program and assembly listing files

Assembler directives

-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions

-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-

bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in

the program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28

SP NOTES

BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant

WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area

Lecture_no18 General design procedure of 360-type Assembler

Syntax Assembly language statements

Assembly language statements are entered one statement per line Each statement follows the following format

[label] mnemonic [operands] [comment]

The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command

An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)

Ex-

START LDAA 24 H

Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29

SP NOTES

on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character

Types of Assembly language statements

It consists of 3 typeseg

(i)Imperative statement

(ii)Declarative statements

(iii)Assembler directive statements

Advantages amp disadvantages of Assembly language

Lecture_no19 Design Specification of 1-pass assembler with example

Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task

Assembler Design can be done in

ndashSingle pass

ndashTwo pass

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30

Source program

Pass1

OPTAB SYMTAB

Intermediate file

Pass2

SYMTAB

Object codes

OPTAB

SP NOTES

The Intermediate file include each source statement assigned address and error indicator

Pass I

Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location

counter Build the symbol table (Table that is used to store each label and its corresponding value)

Pass II

Generate object code(perform actual translation)

One-Pass Assembler

The translation performed by an assembler is essentially a collection of substitutions machine operation code

for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines

bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31

LOCCTR

SP NOTES

(detailed pass1 assembler flowchart)

Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32

SP NOTES

Lecture_no20 Multi-pass assembler with example

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

(Detailed pass2 assembler flowchart)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Register A special kind of very fast memory which is referred to by name rather than by a memory address number Registers often hold data values that are currently in use by the ALU or other subsystems of the architecture

Random access memory (RAM) The typical computer memory where each cell is addressed individually and may be updated or accessed in time that is independent of the cellrsquos location

Read-only memory (ROM) A memory that is usually very similar to RAM except it may only be accessed not updated

Store The process of writing a value to a memory cell

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 12

SP NOTES

Lecture_no7 Instruction format with microflowchart for ADD instruction

Flowchart for instruction fetchamp execution of general machine structure

Typical operating system Information flow in microflowchart for ADDSUBMULT

Memory operations

FETCH (addr)

1Put addr into MAR2Tell memory unit to ldquoloadrdquo3Memory copies data into MDR

STORE(addrnew-value)

1 Put addr into MAR2 Put new-value into MDR3 Tell memory unit to ldquostorerdquo4 Memory stores data from MDR into memory cell

Problem 1)

a) Write an assembly program for a GPR machine having 8 general purpose registers (R1R7) to evaluate the following expression Ignore the remainder part of the division process

( (a + b) (a + d) ) (a e)

where abde are variables with addresses ABDE respectively and each operation is defined as follows + --------gt ADD Ri Rj ( Rj lt- Ri + Rj) --------gt MULT Ri Rj ( Rj lt- Ri Rj) --------gt DIV Ri Rj ( Rj lt- Rj Ri)

a = M[A] ------gt content of memory location at address A same for B D E

move A D0 D0 = amove B D4 D4 = badd D0 D4 D4 = a + bmove D D1 D1 = dadd D0 D1 D1 = a + ddiv D1D4 D4 = (a + b) (a + d)move E D2 D2 = emult D0 D2 D2 = (a e)mult D4 D2 D2 = ( (a + b) (a + d) ) (a e)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 13

SP NOTES

Problem 2)

a) Provide micro-instructions in the form of detailed flowchart for the execution of the following instructions

b) Show the flow of the micro-instructions within the GPR architecture by highlighting the system buses and the registers used

i)move X D0 move contents of memory location X to register D0ii)move D0 X move contents of register D0 to memory location Xiii)movea 22(a3)25(a1) move contents from memory to memoryiv)bgtw label branch if greater than to location ldquolabelrdquo

Answer 2)

For each of the above for part b) draw the CPU consisting of all the registers (for example MAR MBR PC IR Address and Data Registers ALU) and Memory and show each of the micro-instruction as in part a)

i) move X D0Machine Instruction

0011 0000 0011 1000

Address of X

a) Micro-steps are

MAR lt-- PC

MBR lt-- M[MAR] fetch

IR lt-- MBR

| decode |PC lt-- PC + 2 PC points to source addressMAR lt-- PCMBR lt-- M[MAR] address of X in MBRMAR MBR move X to MARMBR M[MAR] read contents of location XD0 MBR PC lt-- PC + 2 PC points to next instruction

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 14

SP NOTES

ii) move D0 X

Machine Instruction

0011 1110 0000 0000

Address of X

a) Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to destination addressMAR lt-- PCMBR M[MAR] read destination address XMAR MBR MAR has X now

MBR D0 source data in MBR

M[MAR] MBR data in Effective address X (which is in MAR)

PC lt-- PC + 2 PC points to next instruction

iii) movea 22(a3)25(a1)

Machine Instruction

0011 0011 0110 1011

0000 0000 0001 0110

0000 0000 0001 1001

a)Source Effective address = contents of a3 + $16temp lt--- M[a3 + $16]

Destination Effective address = contents of a1 + $19M[a1 + $19] lt- temp

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 15

SP NOTES

Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to source displacementMAR lt-- PCMBR lt-- M[MAR] source displacement loadedMAR lt-- [A3 + MBR] source effective address calculatedMBR lt- M[MAR] source data in MBRWR lt- MBR saved temporarily since MBR is to be reusedPC lt-- PC + 2 PC points to destination displacementMAR lt-- PCMBR lt-- M[MAR] destination displacement loaded

MAR lt-- [A1 + MBR] destination effective address calculatedMBR lt- WR source data back to MBR

M[MAR] lt-- MBR source data moved to destination

PC lt-- PC + 2 PC points to next instruction

iv) bgtw label

Machine Instruction

0110 1110 0000 0000 ---- displacement ----

a)MAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | interpret decode |

TRUE FALSE | |

PC lt-- PC + 2 PC lt- PC + 4 MAR lt-- PC MBR lt-- M[MAR] get displacement PC lt-- PC + MBR execute

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 16

SP NOTES

Lecture_no8 Memory unit Register Data of IBM 360

General machine structure of IBM 360370

The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific

(i)Memory unit

Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword

The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory

(ii)Register-

Registers are areas of storage that are set aside in the processing unit

There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)

Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions

(iii)Data

Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits

followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity

Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign

Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17

SP NOTES

three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output

Lecture_no9 Various instruction format of IBM 360370

( iv)Instruction

Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats

RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R

Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field

RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location

Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement

RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces

Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement

SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data

Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement

SS (six bytes) S tells that that there is data in storage Here both operands have data in storage

Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1

Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18

SP NOTES

Addressing Mode

1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D

4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19

SP NOTES

Lecture_no10 Machine language Long way no looping method

Elementary Instruction Set

Typical Data Transfer Instructions

Typical Arithmetic Instructions

Typical Logical and Bit Manipulation Instr

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20

SP NOTES

Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y

MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT

Lecture_no11 Address modification using instruction as data ampusing index register

Lecture_no12 Looping method with example

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21

SP NOTES

Lecture_no13 Assembly language machine language

Assembly LanguageAssembly instructions are just shorthand for machine instructions

Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22

SP NOTES

Very easy to write an Assembly Language -gtMachine language translator

Exercise

What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution

STORE 1 R1MOVE R1 R2LOAD R2 1HALT

Machine Language

The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer

Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer

10000001001001011000000101000101101000010000011010000010000001101111111111111111

LOAD contents of memorylocation into register

100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011

STORE contents of registerinto memory location

100000100 RR MMMMMex Mem[4] = R0100000100 00 00100

MOVE contents of oneregister into another register

100100010000 RR RRex R0 = R1100100010000 00 01

ADD contents of 2 registersstore result in third

1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10

SUBTRACT contents of 2registers store result into third

1010001000 RR RR RRex R0 = R1 ndash R2

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23

SP NOTES

1010001000 00 01 10

Halt the program 1111111111111111

Lecture_no14 Meaning of Assembly language program types Literals

bullUSING

bullBALR

bullSTART

bullEND

bullBR

bullLiteral

bullLTORG

bullBCT

bullEQU ExampleSTART EQU 10

The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write

START EQU 10ORG STARTGOTO 10

bullORG

The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored

Example

ORG 0GOTO 10

The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000

But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24

SP NOTES

bullpseudo-ops vs machine-ops

Lecture_no15 Example on Assembly language program

Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc

LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25

SP NOTES

14 NUMBER BLKW 115 SIX FILL x000616 17 END

Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode

The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7

The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary

LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10

BRp AGAIN

If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label

CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader

Pseudo-ops (Assembler Directives)

Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26

SP NOTES

ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory

Lecture_no16 surprise test

Lecture_no17 MODULE-2

AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source

Cross AssemblerThe assembler may run on one machine and produce object code for another

An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader

(Fig Assembler)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27

SP NOTES

For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program

(Fig Program Translation Steps)

Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer

Basic Assembler Functions

Convert mnemonic operation codes to machine language equivalents

Convert symbolic operands to machine addresses (pass 1)

Build machine instructions in the proper format

Convert data constants to internal (machine)representations

Error checking is provided

Write the object program and assembly listing files

Assembler directives

-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions

-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-

bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in

the program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28

SP NOTES

BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant

WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area

Lecture_no18 General design procedure of 360-type Assembler

Syntax Assembly language statements

Assembly language statements are entered one statement per line Each statement follows the following format

[label] mnemonic [operands] [comment]

The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command

An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)

Ex-

START LDAA 24 H

Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29

SP NOTES

on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character

Types of Assembly language statements

It consists of 3 typeseg

(i)Imperative statement

(ii)Declarative statements

(iii)Assembler directive statements

Advantages amp disadvantages of Assembly language

Lecture_no19 Design Specification of 1-pass assembler with example

Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task

Assembler Design can be done in

ndashSingle pass

ndashTwo pass

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30

Source program

Pass1

OPTAB SYMTAB

Intermediate file

Pass2

SYMTAB

Object codes

OPTAB

SP NOTES

The Intermediate file include each source statement assigned address and error indicator

Pass I

Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location

counter Build the symbol table (Table that is used to store each label and its corresponding value)

Pass II

Generate object code(perform actual translation)

One-Pass Assembler

The translation performed by an assembler is essentially a collection of substitutions machine operation code

for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines

bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31

LOCCTR

SP NOTES

(detailed pass1 assembler flowchart)

Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32

SP NOTES

Lecture_no20 Multi-pass assembler with example

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

(Detailed pass2 assembler flowchart)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Lecture_no7 Instruction format with microflowchart for ADD instruction

Flowchart for instruction fetchamp execution of general machine structure

Typical operating system Information flow in microflowchart for ADDSUBMULT

Memory operations

FETCH (addr)

1Put addr into MAR2Tell memory unit to ldquoloadrdquo3Memory copies data into MDR

STORE(addrnew-value)

1 Put addr into MAR2 Put new-value into MDR3 Tell memory unit to ldquostorerdquo4 Memory stores data from MDR into memory cell

Problem 1)

a) Write an assembly program for a GPR machine having 8 general purpose registers (R1R7) to evaluate the following expression Ignore the remainder part of the division process

( (a + b) (a + d) ) (a e)

where abde are variables with addresses ABDE respectively and each operation is defined as follows + --------gt ADD Ri Rj ( Rj lt- Ri + Rj) --------gt MULT Ri Rj ( Rj lt- Ri Rj) --------gt DIV Ri Rj ( Rj lt- Rj Ri)

a = M[A] ------gt content of memory location at address A same for B D E

move A D0 D0 = amove B D4 D4 = badd D0 D4 D4 = a + bmove D D1 D1 = dadd D0 D1 D1 = a + ddiv D1D4 D4 = (a + b) (a + d)move E D2 D2 = emult D0 D2 D2 = (a e)mult D4 D2 D2 = ( (a + b) (a + d) ) (a e)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 13

SP NOTES

Problem 2)

a) Provide micro-instructions in the form of detailed flowchart for the execution of the following instructions

b) Show the flow of the micro-instructions within the GPR architecture by highlighting the system buses and the registers used

i)move X D0 move contents of memory location X to register D0ii)move D0 X move contents of register D0 to memory location Xiii)movea 22(a3)25(a1) move contents from memory to memoryiv)bgtw label branch if greater than to location ldquolabelrdquo

Answer 2)

For each of the above for part b) draw the CPU consisting of all the registers (for example MAR MBR PC IR Address and Data Registers ALU) and Memory and show each of the micro-instruction as in part a)

i) move X D0Machine Instruction

0011 0000 0011 1000

Address of X

a) Micro-steps are

MAR lt-- PC

MBR lt-- M[MAR] fetch

IR lt-- MBR

| decode |PC lt-- PC + 2 PC points to source addressMAR lt-- PCMBR lt-- M[MAR] address of X in MBRMAR MBR move X to MARMBR M[MAR] read contents of location XD0 MBR PC lt-- PC + 2 PC points to next instruction

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 14

SP NOTES

ii) move D0 X

Machine Instruction

0011 1110 0000 0000

Address of X

a) Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to destination addressMAR lt-- PCMBR M[MAR] read destination address XMAR MBR MAR has X now

MBR D0 source data in MBR

M[MAR] MBR data in Effective address X (which is in MAR)

PC lt-- PC + 2 PC points to next instruction

iii) movea 22(a3)25(a1)

Machine Instruction

0011 0011 0110 1011

0000 0000 0001 0110

0000 0000 0001 1001

a)Source Effective address = contents of a3 + $16temp lt--- M[a3 + $16]

Destination Effective address = contents of a1 + $19M[a1 + $19] lt- temp

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 15

SP NOTES

Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to source displacementMAR lt-- PCMBR lt-- M[MAR] source displacement loadedMAR lt-- [A3 + MBR] source effective address calculatedMBR lt- M[MAR] source data in MBRWR lt- MBR saved temporarily since MBR is to be reusedPC lt-- PC + 2 PC points to destination displacementMAR lt-- PCMBR lt-- M[MAR] destination displacement loaded

MAR lt-- [A1 + MBR] destination effective address calculatedMBR lt- WR source data back to MBR

M[MAR] lt-- MBR source data moved to destination

PC lt-- PC + 2 PC points to next instruction

iv) bgtw label

Machine Instruction

0110 1110 0000 0000 ---- displacement ----

a)MAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | interpret decode |

TRUE FALSE | |

PC lt-- PC + 2 PC lt- PC + 4 MAR lt-- PC MBR lt-- M[MAR] get displacement PC lt-- PC + MBR execute

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 16

SP NOTES

Lecture_no8 Memory unit Register Data of IBM 360

General machine structure of IBM 360370

The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific

(i)Memory unit

Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword

The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory

(ii)Register-

Registers are areas of storage that are set aside in the processing unit

There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)

Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions

(iii)Data

Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits

followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity

Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign

Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17

SP NOTES

three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output

Lecture_no9 Various instruction format of IBM 360370

( iv)Instruction

Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats

RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R

Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field

RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location

Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement

RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces

Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement

SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data

Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement

SS (six bytes) S tells that that there is data in storage Here both operands have data in storage

Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1

Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18

SP NOTES

Addressing Mode

1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D

4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19

SP NOTES

Lecture_no10 Machine language Long way no looping method

Elementary Instruction Set

Typical Data Transfer Instructions

Typical Arithmetic Instructions

Typical Logical and Bit Manipulation Instr

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20

SP NOTES

Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y

MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT

Lecture_no11 Address modification using instruction as data ampusing index register

Lecture_no12 Looping method with example

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21

SP NOTES

Lecture_no13 Assembly language machine language

Assembly LanguageAssembly instructions are just shorthand for machine instructions

Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22

SP NOTES

Very easy to write an Assembly Language -gtMachine language translator

Exercise

What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution

STORE 1 R1MOVE R1 R2LOAD R2 1HALT

Machine Language

The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer

Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer

10000001001001011000000101000101101000010000011010000010000001101111111111111111

LOAD contents of memorylocation into register

100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011

STORE contents of registerinto memory location

100000100 RR MMMMMex Mem[4] = R0100000100 00 00100

MOVE contents of oneregister into another register

100100010000 RR RRex R0 = R1100100010000 00 01

ADD contents of 2 registersstore result in third

1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10

SUBTRACT contents of 2registers store result into third

1010001000 RR RR RRex R0 = R1 ndash R2

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23

SP NOTES

1010001000 00 01 10

Halt the program 1111111111111111

Lecture_no14 Meaning of Assembly language program types Literals

bullUSING

bullBALR

bullSTART

bullEND

bullBR

bullLiteral

bullLTORG

bullBCT

bullEQU ExampleSTART EQU 10

The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write

START EQU 10ORG STARTGOTO 10

bullORG

The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored

Example

ORG 0GOTO 10

The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000

But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24

SP NOTES

bullpseudo-ops vs machine-ops

Lecture_no15 Example on Assembly language program

Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc

LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25

SP NOTES

14 NUMBER BLKW 115 SIX FILL x000616 17 END

Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode

The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7

The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary

LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10

BRp AGAIN

If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label

CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader

Pseudo-ops (Assembler Directives)

Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26

SP NOTES

ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory

Lecture_no16 surprise test

Lecture_no17 MODULE-2

AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source

Cross AssemblerThe assembler may run on one machine and produce object code for another

An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader

(Fig Assembler)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27

SP NOTES

For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program

(Fig Program Translation Steps)

Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer

Basic Assembler Functions

Convert mnemonic operation codes to machine language equivalents

Convert symbolic operands to machine addresses (pass 1)

Build machine instructions in the proper format

Convert data constants to internal (machine)representations

Error checking is provided

Write the object program and assembly listing files

Assembler directives

-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions

-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-

bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in

the program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28

SP NOTES

BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant

WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area

Lecture_no18 General design procedure of 360-type Assembler

Syntax Assembly language statements

Assembly language statements are entered one statement per line Each statement follows the following format

[label] mnemonic [operands] [comment]

The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command

An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)

Ex-

START LDAA 24 H

Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29

SP NOTES

on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character

Types of Assembly language statements

It consists of 3 typeseg

(i)Imperative statement

(ii)Declarative statements

(iii)Assembler directive statements

Advantages amp disadvantages of Assembly language

Lecture_no19 Design Specification of 1-pass assembler with example

Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task

Assembler Design can be done in

ndashSingle pass

ndashTwo pass

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30

Source program

Pass1

OPTAB SYMTAB

Intermediate file

Pass2

SYMTAB

Object codes

OPTAB

SP NOTES

The Intermediate file include each source statement assigned address and error indicator

Pass I

Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location

counter Build the symbol table (Table that is used to store each label and its corresponding value)

Pass II

Generate object code(perform actual translation)

One-Pass Assembler

The translation performed by an assembler is essentially a collection of substitutions machine operation code

for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines

bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31

LOCCTR

SP NOTES

(detailed pass1 assembler flowchart)

Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32

SP NOTES

Lecture_no20 Multi-pass assembler with example

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

(Detailed pass2 assembler flowchart)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Problem 2)

a) Provide micro-instructions in the form of detailed flowchart for the execution of the following instructions

b) Show the flow of the micro-instructions within the GPR architecture by highlighting the system buses and the registers used

i)move X D0 move contents of memory location X to register D0ii)move D0 X move contents of register D0 to memory location Xiii)movea 22(a3)25(a1) move contents from memory to memoryiv)bgtw label branch if greater than to location ldquolabelrdquo

Answer 2)

For each of the above for part b) draw the CPU consisting of all the registers (for example MAR MBR PC IR Address and Data Registers ALU) and Memory and show each of the micro-instruction as in part a)

i) move X D0Machine Instruction

0011 0000 0011 1000

Address of X

a) Micro-steps are

MAR lt-- PC

MBR lt-- M[MAR] fetch

IR lt-- MBR

| decode |PC lt-- PC + 2 PC points to source addressMAR lt-- PCMBR lt-- M[MAR] address of X in MBRMAR MBR move X to MARMBR M[MAR] read contents of location XD0 MBR PC lt-- PC + 2 PC points to next instruction

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 14

SP NOTES

ii) move D0 X

Machine Instruction

0011 1110 0000 0000

Address of X

a) Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to destination addressMAR lt-- PCMBR M[MAR] read destination address XMAR MBR MAR has X now

MBR D0 source data in MBR

M[MAR] MBR data in Effective address X (which is in MAR)

PC lt-- PC + 2 PC points to next instruction

iii) movea 22(a3)25(a1)

Machine Instruction

0011 0011 0110 1011

0000 0000 0001 0110

0000 0000 0001 1001

a)Source Effective address = contents of a3 + $16temp lt--- M[a3 + $16]

Destination Effective address = contents of a1 + $19M[a1 + $19] lt- temp

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 15

SP NOTES

Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to source displacementMAR lt-- PCMBR lt-- M[MAR] source displacement loadedMAR lt-- [A3 + MBR] source effective address calculatedMBR lt- M[MAR] source data in MBRWR lt- MBR saved temporarily since MBR is to be reusedPC lt-- PC + 2 PC points to destination displacementMAR lt-- PCMBR lt-- M[MAR] destination displacement loaded

MAR lt-- [A1 + MBR] destination effective address calculatedMBR lt- WR source data back to MBR

M[MAR] lt-- MBR source data moved to destination

PC lt-- PC + 2 PC points to next instruction

iv) bgtw label

Machine Instruction

0110 1110 0000 0000 ---- displacement ----

a)MAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | interpret decode |

TRUE FALSE | |

PC lt-- PC + 2 PC lt- PC + 4 MAR lt-- PC MBR lt-- M[MAR] get displacement PC lt-- PC + MBR execute

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 16

SP NOTES

Lecture_no8 Memory unit Register Data of IBM 360

General machine structure of IBM 360370

The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific

(i)Memory unit

Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword

The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory

(ii)Register-

Registers are areas of storage that are set aside in the processing unit

There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)

Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions

(iii)Data

Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits

followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity

Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign

Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17

SP NOTES

three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output

Lecture_no9 Various instruction format of IBM 360370

( iv)Instruction

Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats

RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R

Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field

RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location

Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement

RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces

Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement

SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data

Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement

SS (six bytes) S tells that that there is data in storage Here both operands have data in storage

Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1

Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18

SP NOTES

Addressing Mode

1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D

4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19

SP NOTES

Lecture_no10 Machine language Long way no looping method

Elementary Instruction Set

Typical Data Transfer Instructions

Typical Arithmetic Instructions

Typical Logical and Bit Manipulation Instr

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20

SP NOTES

Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y

MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT

Lecture_no11 Address modification using instruction as data ampusing index register

Lecture_no12 Looping method with example

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21

SP NOTES

Lecture_no13 Assembly language machine language

Assembly LanguageAssembly instructions are just shorthand for machine instructions

Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22

SP NOTES

Very easy to write an Assembly Language -gtMachine language translator

Exercise

What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution

STORE 1 R1MOVE R1 R2LOAD R2 1HALT

Machine Language

The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer

Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer

10000001001001011000000101000101101000010000011010000010000001101111111111111111

LOAD contents of memorylocation into register

100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011

STORE contents of registerinto memory location

100000100 RR MMMMMex Mem[4] = R0100000100 00 00100

MOVE contents of oneregister into another register

100100010000 RR RRex R0 = R1100100010000 00 01

ADD contents of 2 registersstore result in third

1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10

SUBTRACT contents of 2registers store result into third

1010001000 RR RR RRex R0 = R1 ndash R2

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23

SP NOTES

1010001000 00 01 10

Halt the program 1111111111111111

Lecture_no14 Meaning of Assembly language program types Literals

bullUSING

bullBALR

bullSTART

bullEND

bullBR

bullLiteral

bullLTORG

bullBCT

bullEQU ExampleSTART EQU 10

The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write

START EQU 10ORG STARTGOTO 10

bullORG

The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored

Example

ORG 0GOTO 10

The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000

But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24

SP NOTES

bullpseudo-ops vs machine-ops

Lecture_no15 Example on Assembly language program

Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc

LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25

SP NOTES

14 NUMBER BLKW 115 SIX FILL x000616 17 END

Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode

The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7

The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary

LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10

BRp AGAIN

If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label

CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader

Pseudo-ops (Assembler Directives)

Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26

SP NOTES

ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory

Lecture_no16 surprise test

Lecture_no17 MODULE-2

AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source

Cross AssemblerThe assembler may run on one machine and produce object code for another

An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader

(Fig Assembler)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27

SP NOTES

For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program

(Fig Program Translation Steps)

Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer

Basic Assembler Functions

Convert mnemonic operation codes to machine language equivalents

Convert symbolic operands to machine addresses (pass 1)

Build machine instructions in the proper format

Convert data constants to internal (machine)representations

Error checking is provided

Write the object program and assembly listing files

Assembler directives

-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions

-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-

bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in

the program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28

SP NOTES

BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant

WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area

Lecture_no18 General design procedure of 360-type Assembler

Syntax Assembly language statements

Assembly language statements are entered one statement per line Each statement follows the following format

[label] mnemonic [operands] [comment]

The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command

An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)

Ex-

START LDAA 24 H

Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29

SP NOTES

on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character

Types of Assembly language statements

It consists of 3 typeseg

(i)Imperative statement

(ii)Declarative statements

(iii)Assembler directive statements

Advantages amp disadvantages of Assembly language

Lecture_no19 Design Specification of 1-pass assembler with example

Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task

Assembler Design can be done in

ndashSingle pass

ndashTwo pass

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30

Source program

Pass1

OPTAB SYMTAB

Intermediate file

Pass2

SYMTAB

Object codes

OPTAB

SP NOTES

The Intermediate file include each source statement assigned address and error indicator

Pass I

Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location

counter Build the symbol table (Table that is used to store each label and its corresponding value)

Pass II

Generate object code(perform actual translation)

One-Pass Assembler

The translation performed by an assembler is essentially a collection of substitutions machine operation code

for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines

bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31

LOCCTR

SP NOTES

(detailed pass1 assembler flowchart)

Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32

SP NOTES

Lecture_no20 Multi-pass assembler with example

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

(Detailed pass2 assembler flowchart)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

ii) move D0 X

Machine Instruction

0011 1110 0000 0000

Address of X

a) Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to destination addressMAR lt-- PCMBR M[MAR] read destination address XMAR MBR MAR has X now

MBR D0 source data in MBR

M[MAR] MBR data in Effective address X (which is in MAR)

PC lt-- PC + 2 PC points to next instruction

iii) movea 22(a3)25(a1)

Machine Instruction

0011 0011 0110 1011

0000 0000 0001 0110

0000 0000 0001 1001

a)Source Effective address = contents of a3 + $16temp lt--- M[a3 + $16]

Destination Effective address = contents of a1 + $19M[a1 + $19] lt- temp

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 15

SP NOTES

Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to source displacementMAR lt-- PCMBR lt-- M[MAR] source displacement loadedMAR lt-- [A3 + MBR] source effective address calculatedMBR lt- M[MAR] source data in MBRWR lt- MBR saved temporarily since MBR is to be reusedPC lt-- PC + 2 PC points to destination displacementMAR lt-- PCMBR lt-- M[MAR] destination displacement loaded

MAR lt-- [A1 + MBR] destination effective address calculatedMBR lt- WR source data back to MBR

M[MAR] lt-- MBR source data moved to destination

PC lt-- PC + 2 PC points to next instruction

iv) bgtw label

Machine Instruction

0110 1110 0000 0000 ---- displacement ----

a)MAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | interpret decode |

TRUE FALSE | |

PC lt-- PC + 2 PC lt- PC + 4 MAR lt-- PC MBR lt-- M[MAR] get displacement PC lt-- PC + MBR execute

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 16

SP NOTES

Lecture_no8 Memory unit Register Data of IBM 360

General machine structure of IBM 360370

The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific

(i)Memory unit

Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword

The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory

(ii)Register-

Registers are areas of storage that are set aside in the processing unit

There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)

Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions

(iii)Data

Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits

followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity

Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign

Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17

SP NOTES

three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output

Lecture_no9 Various instruction format of IBM 360370

( iv)Instruction

Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats

RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R

Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field

RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location

Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement

RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces

Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement

SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data

Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement

SS (six bytes) S tells that that there is data in storage Here both operands have data in storage

Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1

Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18

SP NOTES

Addressing Mode

1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D

4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19

SP NOTES

Lecture_no10 Machine language Long way no looping method

Elementary Instruction Set

Typical Data Transfer Instructions

Typical Arithmetic Instructions

Typical Logical and Bit Manipulation Instr

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20

SP NOTES

Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y

MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT

Lecture_no11 Address modification using instruction as data ampusing index register

Lecture_no12 Looping method with example

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21

SP NOTES

Lecture_no13 Assembly language machine language

Assembly LanguageAssembly instructions are just shorthand for machine instructions

Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22

SP NOTES

Very easy to write an Assembly Language -gtMachine language translator

Exercise

What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution

STORE 1 R1MOVE R1 R2LOAD R2 1HALT

Machine Language

The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer

Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer

10000001001001011000000101000101101000010000011010000010000001101111111111111111

LOAD contents of memorylocation into register

100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011

STORE contents of registerinto memory location

100000100 RR MMMMMex Mem[4] = R0100000100 00 00100

MOVE contents of oneregister into another register

100100010000 RR RRex R0 = R1100100010000 00 01

ADD contents of 2 registersstore result in third

1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10

SUBTRACT contents of 2registers store result into third

1010001000 RR RR RRex R0 = R1 ndash R2

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23

SP NOTES

1010001000 00 01 10

Halt the program 1111111111111111

Lecture_no14 Meaning of Assembly language program types Literals

bullUSING

bullBALR

bullSTART

bullEND

bullBR

bullLiteral

bullLTORG

bullBCT

bullEQU ExampleSTART EQU 10

The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write

START EQU 10ORG STARTGOTO 10

bullORG

The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored

Example

ORG 0GOTO 10

The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000

But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24

SP NOTES

bullpseudo-ops vs machine-ops

Lecture_no15 Example on Assembly language program

Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc

LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25

SP NOTES

14 NUMBER BLKW 115 SIX FILL x000616 17 END

Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode

The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7

The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary

LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10

BRp AGAIN

If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label

CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader

Pseudo-ops (Assembler Directives)

Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26

SP NOTES

ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory

Lecture_no16 surprise test

Lecture_no17 MODULE-2

AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source

Cross AssemblerThe assembler may run on one machine and produce object code for another

An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader

(Fig Assembler)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27

SP NOTES

For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program

(Fig Program Translation Steps)

Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer

Basic Assembler Functions

Convert mnemonic operation codes to machine language equivalents

Convert symbolic operands to machine addresses (pass 1)

Build machine instructions in the proper format

Convert data constants to internal (machine)representations

Error checking is provided

Write the object program and assembly listing files

Assembler directives

-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions

-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-

bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in

the program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28

SP NOTES

BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant

WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area

Lecture_no18 General design procedure of 360-type Assembler

Syntax Assembly language statements

Assembly language statements are entered one statement per line Each statement follows the following format

[label] mnemonic [operands] [comment]

The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command

An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)

Ex-

START LDAA 24 H

Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29

SP NOTES

on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character

Types of Assembly language statements

It consists of 3 typeseg

(i)Imperative statement

(ii)Declarative statements

(iii)Assembler directive statements

Advantages amp disadvantages of Assembly language

Lecture_no19 Design Specification of 1-pass assembler with example

Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task

Assembler Design can be done in

ndashSingle pass

ndashTwo pass

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30

Source program

Pass1

OPTAB SYMTAB

Intermediate file

Pass2

SYMTAB

Object codes

OPTAB

SP NOTES

The Intermediate file include each source statement assigned address and error indicator

Pass I

Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location

counter Build the symbol table (Table that is used to store each label and its corresponding value)

Pass II

Generate object code(perform actual translation)

One-Pass Assembler

The translation performed by an assembler is essentially a collection of substitutions machine operation code

for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines

bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31

LOCCTR

SP NOTES

(detailed pass1 assembler flowchart)

Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32

SP NOTES

Lecture_no20 Multi-pass assembler with example

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

(Detailed pass2 assembler flowchart)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to source displacementMAR lt-- PCMBR lt-- M[MAR] source displacement loadedMAR lt-- [A3 + MBR] source effective address calculatedMBR lt- M[MAR] source data in MBRWR lt- MBR saved temporarily since MBR is to be reusedPC lt-- PC + 2 PC points to destination displacementMAR lt-- PCMBR lt-- M[MAR] destination displacement loaded

MAR lt-- [A1 + MBR] destination effective address calculatedMBR lt- WR source data back to MBR

M[MAR] lt-- MBR source data moved to destination

PC lt-- PC + 2 PC points to next instruction

iv) bgtw label

Machine Instruction

0110 1110 0000 0000 ---- displacement ----

a)MAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | interpret decode |

TRUE FALSE | |

PC lt-- PC + 2 PC lt- PC + 4 MAR lt-- PC MBR lt-- M[MAR] get displacement PC lt-- PC + MBR execute

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 16

SP NOTES

Lecture_no8 Memory unit Register Data of IBM 360

General machine structure of IBM 360370

The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific

(i)Memory unit

Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword

The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory

(ii)Register-

Registers are areas of storage that are set aside in the processing unit

There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)

Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions

(iii)Data

Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits

followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity

Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign

Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17

SP NOTES

three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output

Lecture_no9 Various instruction format of IBM 360370

( iv)Instruction

Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats

RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R

Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field

RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location

Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement

RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces

Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement

SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data

Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement

SS (six bytes) S tells that that there is data in storage Here both operands have data in storage

Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1

Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18

SP NOTES

Addressing Mode

1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D

4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19

SP NOTES

Lecture_no10 Machine language Long way no looping method

Elementary Instruction Set

Typical Data Transfer Instructions

Typical Arithmetic Instructions

Typical Logical and Bit Manipulation Instr

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20

SP NOTES

Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y

MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT

Lecture_no11 Address modification using instruction as data ampusing index register

Lecture_no12 Looping method with example

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21

SP NOTES

Lecture_no13 Assembly language machine language

Assembly LanguageAssembly instructions are just shorthand for machine instructions

Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22

SP NOTES

Very easy to write an Assembly Language -gtMachine language translator

Exercise

What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution

STORE 1 R1MOVE R1 R2LOAD R2 1HALT

Machine Language

The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer

Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer

10000001001001011000000101000101101000010000011010000010000001101111111111111111

LOAD contents of memorylocation into register

100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011

STORE contents of registerinto memory location

100000100 RR MMMMMex Mem[4] = R0100000100 00 00100

MOVE contents of oneregister into another register

100100010000 RR RRex R0 = R1100100010000 00 01

ADD contents of 2 registersstore result in third

1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10

SUBTRACT contents of 2registers store result into third

1010001000 RR RR RRex R0 = R1 ndash R2

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23

SP NOTES

1010001000 00 01 10

Halt the program 1111111111111111

Lecture_no14 Meaning of Assembly language program types Literals

bullUSING

bullBALR

bullSTART

bullEND

bullBR

bullLiteral

bullLTORG

bullBCT

bullEQU ExampleSTART EQU 10

The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write

START EQU 10ORG STARTGOTO 10

bullORG

The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored

Example

ORG 0GOTO 10

The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000

But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24

SP NOTES

bullpseudo-ops vs machine-ops

Lecture_no15 Example on Assembly language program

Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc

LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25

SP NOTES

14 NUMBER BLKW 115 SIX FILL x000616 17 END

Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode

The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7

The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary

LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10

BRp AGAIN

If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label

CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader

Pseudo-ops (Assembler Directives)

Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26

SP NOTES

ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory

Lecture_no16 surprise test

Lecture_no17 MODULE-2

AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source

Cross AssemblerThe assembler may run on one machine and produce object code for another

An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader

(Fig Assembler)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27

SP NOTES

For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program

(Fig Program Translation Steps)

Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer

Basic Assembler Functions

Convert mnemonic operation codes to machine language equivalents

Convert symbolic operands to machine addresses (pass 1)

Build machine instructions in the proper format

Convert data constants to internal (machine)representations

Error checking is provided

Write the object program and assembly listing files

Assembler directives

-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions

-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-

bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in

the program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28

SP NOTES

BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant

WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area

Lecture_no18 General design procedure of 360-type Assembler

Syntax Assembly language statements

Assembly language statements are entered one statement per line Each statement follows the following format

[label] mnemonic [operands] [comment]

The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command

An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)

Ex-

START LDAA 24 H

Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29

SP NOTES

on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character

Types of Assembly language statements

It consists of 3 typeseg

(i)Imperative statement

(ii)Declarative statements

(iii)Assembler directive statements

Advantages amp disadvantages of Assembly language

Lecture_no19 Design Specification of 1-pass assembler with example

Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task

Assembler Design can be done in

ndashSingle pass

ndashTwo pass

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30

Source program

Pass1

OPTAB SYMTAB

Intermediate file

Pass2

SYMTAB

Object codes

OPTAB

SP NOTES

The Intermediate file include each source statement assigned address and error indicator

Pass I

Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location

counter Build the symbol table (Table that is used to store each label and its corresponding value)

Pass II

Generate object code(perform actual translation)

One-Pass Assembler

The translation performed by an assembler is essentially a collection of substitutions machine operation code

for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines

bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31

LOCCTR

SP NOTES

(detailed pass1 assembler flowchart)

Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32

SP NOTES

Lecture_no20 Multi-pass assembler with example

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

(Detailed pass2 assembler flowchart)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Lecture_no8 Memory unit Register Data of IBM 360

General machine structure of IBM 360370

The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific

(i)Memory unit

Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword

The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory

(ii)Register-

Registers are areas of storage that are set aside in the processing unit

There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)

Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions

(iii)Data

Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits

followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity

Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign

Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17

SP NOTES

three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output

Lecture_no9 Various instruction format of IBM 360370

( iv)Instruction

Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats

RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R

Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field

RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location

Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement

RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces

Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement

SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data

Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement

SS (six bytes) S tells that that there is data in storage Here both operands have data in storage

Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1

Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18

SP NOTES

Addressing Mode

1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D

4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19

SP NOTES

Lecture_no10 Machine language Long way no looping method

Elementary Instruction Set

Typical Data Transfer Instructions

Typical Arithmetic Instructions

Typical Logical and Bit Manipulation Instr

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20

SP NOTES

Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y

MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT

Lecture_no11 Address modification using instruction as data ampusing index register

Lecture_no12 Looping method with example

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21

SP NOTES

Lecture_no13 Assembly language machine language

Assembly LanguageAssembly instructions are just shorthand for machine instructions

Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22

SP NOTES

Very easy to write an Assembly Language -gtMachine language translator

Exercise

What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution

STORE 1 R1MOVE R1 R2LOAD R2 1HALT

Machine Language

The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer

Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer

10000001001001011000000101000101101000010000011010000010000001101111111111111111

LOAD contents of memorylocation into register

100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011

STORE contents of registerinto memory location

100000100 RR MMMMMex Mem[4] = R0100000100 00 00100

MOVE contents of oneregister into another register

100100010000 RR RRex R0 = R1100100010000 00 01

ADD contents of 2 registersstore result in third

1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10

SUBTRACT contents of 2registers store result into third

1010001000 RR RR RRex R0 = R1 ndash R2

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23

SP NOTES

1010001000 00 01 10

Halt the program 1111111111111111

Lecture_no14 Meaning of Assembly language program types Literals

bullUSING

bullBALR

bullSTART

bullEND

bullBR

bullLiteral

bullLTORG

bullBCT

bullEQU ExampleSTART EQU 10

The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write

START EQU 10ORG STARTGOTO 10

bullORG

The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored

Example

ORG 0GOTO 10

The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000

But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24

SP NOTES

bullpseudo-ops vs machine-ops

Lecture_no15 Example on Assembly language program

Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc

LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25

SP NOTES

14 NUMBER BLKW 115 SIX FILL x000616 17 END

Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode

The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7

The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary

LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10

BRp AGAIN

If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label

CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader

Pseudo-ops (Assembler Directives)

Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26

SP NOTES

ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory

Lecture_no16 surprise test

Lecture_no17 MODULE-2

AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source

Cross AssemblerThe assembler may run on one machine and produce object code for another

An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader

(Fig Assembler)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27

SP NOTES

For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program

(Fig Program Translation Steps)

Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer

Basic Assembler Functions

Convert mnemonic operation codes to machine language equivalents

Convert symbolic operands to machine addresses (pass 1)

Build machine instructions in the proper format

Convert data constants to internal (machine)representations

Error checking is provided

Write the object program and assembly listing files

Assembler directives

-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions

-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-

bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in

the program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28

SP NOTES

BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant

WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area

Lecture_no18 General design procedure of 360-type Assembler

Syntax Assembly language statements

Assembly language statements are entered one statement per line Each statement follows the following format

[label] mnemonic [operands] [comment]

The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command

An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)

Ex-

START LDAA 24 H

Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29

SP NOTES

on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character

Types of Assembly language statements

It consists of 3 typeseg

(i)Imperative statement

(ii)Declarative statements

(iii)Assembler directive statements

Advantages amp disadvantages of Assembly language

Lecture_no19 Design Specification of 1-pass assembler with example

Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task

Assembler Design can be done in

ndashSingle pass

ndashTwo pass

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30

Source program

Pass1

OPTAB SYMTAB

Intermediate file

Pass2

SYMTAB

Object codes

OPTAB

SP NOTES

The Intermediate file include each source statement assigned address and error indicator

Pass I

Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location

counter Build the symbol table (Table that is used to store each label and its corresponding value)

Pass II

Generate object code(perform actual translation)

One-Pass Assembler

The translation performed by an assembler is essentially a collection of substitutions machine operation code

for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines

bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31

LOCCTR

SP NOTES

(detailed pass1 assembler flowchart)

Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32

SP NOTES

Lecture_no20 Multi-pass assembler with example

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

(Detailed pass2 assembler flowchart)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output

Lecture_no9 Various instruction format of IBM 360370

( iv)Instruction

Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats

RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R

Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field

RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location

Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement

RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces

Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement

SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data

Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement

SS (six bytes) S tells that that there is data in storage Here both operands have data in storage

Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1

Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18

SP NOTES

Addressing Mode

1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D

4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19

SP NOTES

Lecture_no10 Machine language Long way no looping method

Elementary Instruction Set

Typical Data Transfer Instructions

Typical Arithmetic Instructions

Typical Logical and Bit Manipulation Instr

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20

SP NOTES

Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y

MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT

Lecture_no11 Address modification using instruction as data ampusing index register

Lecture_no12 Looping method with example

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21

SP NOTES

Lecture_no13 Assembly language machine language

Assembly LanguageAssembly instructions are just shorthand for machine instructions

Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22

SP NOTES

Very easy to write an Assembly Language -gtMachine language translator

Exercise

What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution

STORE 1 R1MOVE R1 R2LOAD R2 1HALT

Machine Language

The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer

Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer

10000001001001011000000101000101101000010000011010000010000001101111111111111111

LOAD contents of memorylocation into register

100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011

STORE contents of registerinto memory location

100000100 RR MMMMMex Mem[4] = R0100000100 00 00100

MOVE contents of oneregister into another register

100100010000 RR RRex R0 = R1100100010000 00 01

ADD contents of 2 registersstore result in third

1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10

SUBTRACT contents of 2registers store result into third

1010001000 RR RR RRex R0 = R1 ndash R2

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23

SP NOTES

1010001000 00 01 10

Halt the program 1111111111111111

Lecture_no14 Meaning of Assembly language program types Literals

bullUSING

bullBALR

bullSTART

bullEND

bullBR

bullLiteral

bullLTORG

bullBCT

bullEQU ExampleSTART EQU 10

The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write

START EQU 10ORG STARTGOTO 10

bullORG

The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored

Example

ORG 0GOTO 10

The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000

But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24

SP NOTES

bullpseudo-ops vs machine-ops

Lecture_no15 Example on Assembly language program

Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc

LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25

SP NOTES

14 NUMBER BLKW 115 SIX FILL x000616 17 END

Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode

The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7

The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary

LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10

BRp AGAIN

If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label

CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader

Pseudo-ops (Assembler Directives)

Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26

SP NOTES

ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory

Lecture_no16 surprise test

Lecture_no17 MODULE-2

AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source

Cross AssemblerThe assembler may run on one machine and produce object code for another

An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader

(Fig Assembler)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27

SP NOTES

For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program

(Fig Program Translation Steps)

Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer

Basic Assembler Functions

Convert mnemonic operation codes to machine language equivalents

Convert symbolic operands to machine addresses (pass 1)

Build machine instructions in the proper format

Convert data constants to internal (machine)representations

Error checking is provided

Write the object program and assembly listing files

Assembler directives

-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions

-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-

bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in

the program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28

SP NOTES

BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant

WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area

Lecture_no18 General design procedure of 360-type Assembler

Syntax Assembly language statements

Assembly language statements are entered one statement per line Each statement follows the following format

[label] mnemonic [operands] [comment]

The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command

An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)

Ex-

START LDAA 24 H

Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29

SP NOTES

on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character

Types of Assembly language statements

It consists of 3 typeseg

(i)Imperative statement

(ii)Declarative statements

(iii)Assembler directive statements

Advantages amp disadvantages of Assembly language

Lecture_no19 Design Specification of 1-pass assembler with example

Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task

Assembler Design can be done in

ndashSingle pass

ndashTwo pass

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30

Source program

Pass1

OPTAB SYMTAB

Intermediate file

Pass2

SYMTAB

Object codes

OPTAB

SP NOTES

The Intermediate file include each source statement assigned address and error indicator

Pass I

Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location

counter Build the symbol table (Table that is used to store each label and its corresponding value)

Pass II

Generate object code(perform actual translation)

One-Pass Assembler

The translation performed by an assembler is essentially a collection of substitutions machine operation code

for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines

bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31

LOCCTR

SP NOTES

(detailed pass1 assembler flowchart)

Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32

SP NOTES

Lecture_no20 Multi-pass assembler with example

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

(Detailed pass2 assembler flowchart)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Addressing Mode

1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D

4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19

SP NOTES

Lecture_no10 Machine language Long way no looping method

Elementary Instruction Set

Typical Data Transfer Instructions

Typical Arithmetic Instructions

Typical Logical and Bit Manipulation Instr

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20

SP NOTES

Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y

MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT

Lecture_no11 Address modification using instruction as data ampusing index register

Lecture_no12 Looping method with example

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21

SP NOTES

Lecture_no13 Assembly language machine language

Assembly LanguageAssembly instructions are just shorthand for machine instructions

Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22

SP NOTES

Very easy to write an Assembly Language -gtMachine language translator

Exercise

What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution

STORE 1 R1MOVE R1 R2LOAD R2 1HALT

Machine Language

The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer

Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer

10000001001001011000000101000101101000010000011010000010000001101111111111111111

LOAD contents of memorylocation into register

100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011

STORE contents of registerinto memory location

100000100 RR MMMMMex Mem[4] = R0100000100 00 00100

MOVE contents of oneregister into another register

100100010000 RR RRex R0 = R1100100010000 00 01

ADD contents of 2 registersstore result in third

1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10

SUBTRACT contents of 2registers store result into third

1010001000 RR RR RRex R0 = R1 ndash R2

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23

SP NOTES

1010001000 00 01 10

Halt the program 1111111111111111

Lecture_no14 Meaning of Assembly language program types Literals

bullUSING

bullBALR

bullSTART

bullEND

bullBR

bullLiteral

bullLTORG

bullBCT

bullEQU ExampleSTART EQU 10

The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write

START EQU 10ORG STARTGOTO 10

bullORG

The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored

Example

ORG 0GOTO 10

The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000

But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24

SP NOTES

bullpseudo-ops vs machine-ops

Lecture_no15 Example on Assembly language program

Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc

LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25

SP NOTES

14 NUMBER BLKW 115 SIX FILL x000616 17 END

Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode

The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7

The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary

LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10

BRp AGAIN

If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label

CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader

Pseudo-ops (Assembler Directives)

Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26

SP NOTES

ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory

Lecture_no16 surprise test

Lecture_no17 MODULE-2

AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source

Cross AssemblerThe assembler may run on one machine and produce object code for another

An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader

(Fig Assembler)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27

SP NOTES

For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program

(Fig Program Translation Steps)

Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer

Basic Assembler Functions

Convert mnemonic operation codes to machine language equivalents

Convert symbolic operands to machine addresses (pass 1)

Build machine instructions in the proper format

Convert data constants to internal (machine)representations

Error checking is provided

Write the object program and assembly listing files

Assembler directives

-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions

-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-

bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in

the program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28

SP NOTES

BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant

WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area

Lecture_no18 General design procedure of 360-type Assembler

Syntax Assembly language statements

Assembly language statements are entered one statement per line Each statement follows the following format

[label] mnemonic [operands] [comment]

The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command

An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)

Ex-

START LDAA 24 H

Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29

SP NOTES

on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character

Types of Assembly language statements

It consists of 3 typeseg

(i)Imperative statement

(ii)Declarative statements

(iii)Assembler directive statements

Advantages amp disadvantages of Assembly language

Lecture_no19 Design Specification of 1-pass assembler with example

Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task

Assembler Design can be done in

ndashSingle pass

ndashTwo pass

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30

Source program

Pass1

OPTAB SYMTAB

Intermediate file

Pass2

SYMTAB

Object codes

OPTAB

SP NOTES

The Intermediate file include each source statement assigned address and error indicator

Pass I

Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location

counter Build the symbol table (Table that is used to store each label and its corresponding value)

Pass II

Generate object code(perform actual translation)

One-Pass Assembler

The translation performed by an assembler is essentially a collection of substitutions machine operation code

for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines

bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31

LOCCTR

SP NOTES

(detailed pass1 assembler flowchart)

Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32

SP NOTES

Lecture_no20 Multi-pass assembler with example

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

(Detailed pass2 assembler flowchart)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Lecture_no10 Machine language Long way no looping method

Elementary Instruction Set

Typical Data Transfer Instructions

Typical Arithmetic Instructions

Typical Logical and Bit Manipulation Instr

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20

SP NOTES

Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y

MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT

Lecture_no11 Address modification using instruction as data ampusing index register

Lecture_no12 Looping method with example

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21

SP NOTES

Lecture_no13 Assembly language machine language

Assembly LanguageAssembly instructions are just shorthand for machine instructions

Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22

SP NOTES

Very easy to write an Assembly Language -gtMachine language translator

Exercise

What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution

STORE 1 R1MOVE R1 R2LOAD R2 1HALT

Machine Language

The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer

Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer

10000001001001011000000101000101101000010000011010000010000001101111111111111111

LOAD contents of memorylocation into register

100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011

STORE contents of registerinto memory location

100000100 RR MMMMMex Mem[4] = R0100000100 00 00100

MOVE contents of oneregister into another register

100100010000 RR RRex R0 = R1100100010000 00 01

ADD contents of 2 registersstore result in third

1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10

SUBTRACT contents of 2registers store result into third

1010001000 RR RR RRex R0 = R1 ndash R2

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23

SP NOTES

1010001000 00 01 10

Halt the program 1111111111111111

Lecture_no14 Meaning of Assembly language program types Literals

bullUSING

bullBALR

bullSTART

bullEND

bullBR

bullLiteral

bullLTORG

bullBCT

bullEQU ExampleSTART EQU 10

The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write

START EQU 10ORG STARTGOTO 10

bullORG

The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored

Example

ORG 0GOTO 10

The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000

But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24

SP NOTES

bullpseudo-ops vs machine-ops

Lecture_no15 Example on Assembly language program

Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc

LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25

SP NOTES

14 NUMBER BLKW 115 SIX FILL x000616 17 END

Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode

The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7

The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary

LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10

BRp AGAIN

If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label

CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader

Pseudo-ops (Assembler Directives)

Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26

SP NOTES

ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory

Lecture_no16 surprise test

Lecture_no17 MODULE-2

AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source

Cross AssemblerThe assembler may run on one machine and produce object code for another

An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader

(Fig Assembler)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27

SP NOTES

For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program

(Fig Program Translation Steps)

Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer

Basic Assembler Functions

Convert mnemonic operation codes to machine language equivalents

Convert symbolic operands to machine addresses (pass 1)

Build machine instructions in the proper format

Convert data constants to internal (machine)representations

Error checking is provided

Write the object program and assembly listing files

Assembler directives

-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions

-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-

bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in

the program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28

SP NOTES

BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant

WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area

Lecture_no18 General design procedure of 360-type Assembler

Syntax Assembly language statements

Assembly language statements are entered one statement per line Each statement follows the following format

[label] mnemonic [operands] [comment]

The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command

An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)

Ex-

START LDAA 24 H

Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29

SP NOTES

on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character

Types of Assembly language statements

It consists of 3 typeseg

(i)Imperative statement

(ii)Declarative statements

(iii)Assembler directive statements

Advantages amp disadvantages of Assembly language

Lecture_no19 Design Specification of 1-pass assembler with example

Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task

Assembler Design can be done in

ndashSingle pass

ndashTwo pass

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30

Source program

Pass1

OPTAB SYMTAB

Intermediate file

Pass2

SYMTAB

Object codes

OPTAB

SP NOTES

The Intermediate file include each source statement assigned address and error indicator

Pass I

Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location

counter Build the symbol table (Table that is used to store each label and its corresponding value)

Pass II

Generate object code(perform actual translation)

One-Pass Assembler

The translation performed by an assembler is essentially a collection of substitutions machine operation code

for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines

bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31

LOCCTR

SP NOTES

(detailed pass1 assembler flowchart)

Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32

SP NOTES

Lecture_no20 Multi-pass assembler with example

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

(Detailed pass2 assembler flowchart)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y

MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT

Lecture_no11 Address modification using instruction as data ampusing index register

Lecture_no12 Looping method with example

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21

SP NOTES

Lecture_no13 Assembly language machine language

Assembly LanguageAssembly instructions are just shorthand for machine instructions

Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22

SP NOTES

Very easy to write an Assembly Language -gtMachine language translator

Exercise

What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution

STORE 1 R1MOVE R1 R2LOAD R2 1HALT

Machine Language

The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer

Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer

10000001001001011000000101000101101000010000011010000010000001101111111111111111

LOAD contents of memorylocation into register

100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011

STORE contents of registerinto memory location

100000100 RR MMMMMex Mem[4] = R0100000100 00 00100

MOVE contents of oneregister into another register

100100010000 RR RRex R0 = R1100100010000 00 01

ADD contents of 2 registersstore result in third

1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10

SUBTRACT contents of 2registers store result into third

1010001000 RR RR RRex R0 = R1 ndash R2

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23

SP NOTES

1010001000 00 01 10

Halt the program 1111111111111111

Lecture_no14 Meaning of Assembly language program types Literals

bullUSING

bullBALR

bullSTART

bullEND

bullBR

bullLiteral

bullLTORG

bullBCT

bullEQU ExampleSTART EQU 10

The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write

START EQU 10ORG STARTGOTO 10

bullORG

The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored

Example

ORG 0GOTO 10

The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000

But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24

SP NOTES

bullpseudo-ops vs machine-ops

Lecture_no15 Example on Assembly language program

Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc

LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25

SP NOTES

14 NUMBER BLKW 115 SIX FILL x000616 17 END

Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode

The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7

The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary

LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10

BRp AGAIN

If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label

CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader

Pseudo-ops (Assembler Directives)

Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26

SP NOTES

ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory

Lecture_no16 surprise test

Lecture_no17 MODULE-2

AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source

Cross AssemblerThe assembler may run on one machine and produce object code for another

An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader

(Fig Assembler)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27

SP NOTES

For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program

(Fig Program Translation Steps)

Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer

Basic Assembler Functions

Convert mnemonic operation codes to machine language equivalents

Convert symbolic operands to machine addresses (pass 1)

Build machine instructions in the proper format

Convert data constants to internal (machine)representations

Error checking is provided

Write the object program and assembly listing files

Assembler directives

-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions

-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-

bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in

the program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28

SP NOTES

BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant

WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area

Lecture_no18 General design procedure of 360-type Assembler

Syntax Assembly language statements

Assembly language statements are entered one statement per line Each statement follows the following format

[label] mnemonic [operands] [comment]

The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command

An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)

Ex-

START LDAA 24 H

Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29

SP NOTES

on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character

Types of Assembly language statements

It consists of 3 typeseg

(i)Imperative statement

(ii)Declarative statements

(iii)Assembler directive statements

Advantages amp disadvantages of Assembly language

Lecture_no19 Design Specification of 1-pass assembler with example

Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task

Assembler Design can be done in

ndashSingle pass

ndashTwo pass

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30

Source program

Pass1

OPTAB SYMTAB

Intermediate file

Pass2

SYMTAB

Object codes

OPTAB

SP NOTES

The Intermediate file include each source statement assigned address and error indicator

Pass I

Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location

counter Build the symbol table (Table that is used to store each label and its corresponding value)

Pass II

Generate object code(perform actual translation)

One-Pass Assembler

The translation performed by an assembler is essentially a collection of substitutions machine operation code

for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines

bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31

LOCCTR

SP NOTES

(detailed pass1 assembler flowchart)

Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32

SP NOTES

Lecture_no20 Multi-pass assembler with example

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

(Detailed pass2 assembler flowchart)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Lecture_no13 Assembly language machine language

Assembly LanguageAssembly instructions are just shorthand for machine instructions

Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22

SP NOTES

Very easy to write an Assembly Language -gtMachine language translator

Exercise

What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution

STORE 1 R1MOVE R1 R2LOAD R2 1HALT

Machine Language

The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer

Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer

10000001001001011000000101000101101000010000011010000010000001101111111111111111

LOAD contents of memorylocation into register

100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011

STORE contents of registerinto memory location

100000100 RR MMMMMex Mem[4] = R0100000100 00 00100

MOVE contents of oneregister into another register

100100010000 RR RRex R0 = R1100100010000 00 01

ADD contents of 2 registersstore result in third

1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10

SUBTRACT contents of 2registers store result into third

1010001000 RR RR RRex R0 = R1 ndash R2

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23

SP NOTES

1010001000 00 01 10

Halt the program 1111111111111111

Lecture_no14 Meaning of Assembly language program types Literals

bullUSING

bullBALR

bullSTART

bullEND

bullBR

bullLiteral

bullLTORG

bullBCT

bullEQU ExampleSTART EQU 10

The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write

START EQU 10ORG STARTGOTO 10

bullORG

The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored

Example

ORG 0GOTO 10

The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000

But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24

SP NOTES

bullpseudo-ops vs machine-ops

Lecture_no15 Example on Assembly language program

Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc

LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25

SP NOTES

14 NUMBER BLKW 115 SIX FILL x000616 17 END

Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode

The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7

The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary

LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10

BRp AGAIN

If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label

CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader

Pseudo-ops (Assembler Directives)

Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26

SP NOTES

ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory

Lecture_no16 surprise test

Lecture_no17 MODULE-2

AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source

Cross AssemblerThe assembler may run on one machine and produce object code for another

An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader

(Fig Assembler)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27

SP NOTES

For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program

(Fig Program Translation Steps)

Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer

Basic Assembler Functions

Convert mnemonic operation codes to machine language equivalents

Convert symbolic operands to machine addresses (pass 1)

Build machine instructions in the proper format

Convert data constants to internal (machine)representations

Error checking is provided

Write the object program and assembly listing files

Assembler directives

-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions

-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-

bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in

the program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28

SP NOTES

BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant

WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area

Lecture_no18 General design procedure of 360-type Assembler

Syntax Assembly language statements

Assembly language statements are entered one statement per line Each statement follows the following format

[label] mnemonic [operands] [comment]

The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command

An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)

Ex-

START LDAA 24 H

Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29

SP NOTES

on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character

Types of Assembly language statements

It consists of 3 typeseg

(i)Imperative statement

(ii)Declarative statements

(iii)Assembler directive statements

Advantages amp disadvantages of Assembly language

Lecture_no19 Design Specification of 1-pass assembler with example

Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task

Assembler Design can be done in

ndashSingle pass

ndashTwo pass

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30

Source program

Pass1

OPTAB SYMTAB

Intermediate file

Pass2

SYMTAB

Object codes

OPTAB

SP NOTES

The Intermediate file include each source statement assigned address and error indicator

Pass I

Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location

counter Build the symbol table (Table that is used to store each label and its corresponding value)

Pass II

Generate object code(perform actual translation)

One-Pass Assembler

The translation performed by an assembler is essentially a collection of substitutions machine operation code

for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines

bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31

LOCCTR

SP NOTES

(detailed pass1 assembler flowchart)

Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32

SP NOTES

Lecture_no20 Multi-pass assembler with example

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

(Detailed pass2 assembler flowchart)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Very easy to write an Assembly Language -gtMachine language translator

Exercise

What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution

STORE 1 R1MOVE R1 R2LOAD R2 1HALT

Machine Language

The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer

Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer

10000001001001011000000101000101101000010000011010000010000001101111111111111111

LOAD contents of memorylocation into register

100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011

STORE contents of registerinto memory location

100000100 RR MMMMMex Mem[4] = R0100000100 00 00100

MOVE contents of oneregister into another register

100100010000 RR RRex R0 = R1100100010000 00 01

ADD contents of 2 registersstore result in third

1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10

SUBTRACT contents of 2registers store result into third

1010001000 RR RR RRex R0 = R1 ndash R2

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23

SP NOTES

1010001000 00 01 10

Halt the program 1111111111111111

Lecture_no14 Meaning of Assembly language program types Literals

bullUSING

bullBALR

bullSTART

bullEND

bullBR

bullLiteral

bullLTORG

bullBCT

bullEQU ExampleSTART EQU 10

The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write

START EQU 10ORG STARTGOTO 10

bullORG

The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored

Example

ORG 0GOTO 10

The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000

But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24

SP NOTES

bullpseudo-ops vs machine-ops

Lecture_no15 Example on Assembly language program

Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc

LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25

SP NOTES

14 NUMBER BLKW 115 SIX FILL x000616 17 END

Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode

The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7

The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary

LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10

BRp AGAIN

If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label

CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader

Pseudo-ops (Assembler Directives)

Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26

SP NOTES

ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory

Lecture_no16 surprise test

Lecture_no17 MODULE-2

AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source

Cross AssemblerThe assembler may run on one machine and produce object code for another

An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader

(Fig Assembler)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27

SP NOTES

For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program

(Fig Program Translation Steps)

Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer

Basic Assembler Functions

Convert mnemonic operation codes to machine language equivalents

Convert symbolic operands to machine addresses (pass 1)

Build machine instructions in the proper format

Convert data constants to internal (machine)representations

Error checking is provided

Write the object program and assembly listing files

Assembler directives

-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions

-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-

bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in

the program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28

SP NOTES

BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant

WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area

Lecture_no18 General design procedure of 360-type Assembler

Syntax Assembly language statements

Assembly language statements are entered one statement per line Each statement follows the following format

[label] mnemonic [operands] [comment]

The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command

An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)

Ex-

START LDAA 24 H

Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29

SP NOTES

on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character

Types of Assembly language statements

It consists of 3 typeseg

(i)Imperative statement

(ii)Declarative statements

(iii)Assembler directive statements

Advantages amp disadvantages of Assembly language

Lecture_no19 Design Specification of 1-pass assembler with example

Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task

Assembler Design can be done in

ndashSingle pass

ndashTwo pass

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30

Source program

Pass1

OPTAB SYMTAB

Intermediate file

Pass2

SYMTAB

Object codes

OPTAB

SP NOTES

The Intermediate file include each source statement assigned address and error indicator

Pass I

Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location

counter Build the symbol table (Table that is used to store each label and its corresponding value)

Pass II

Generate object code(perform actual translation)

One-Pass Assembler

The translation performed by an assembler is essentially a collection of substitutions machine operation code

for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines

bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31

LOCCTR

SP NOTES

(detailed pass1 assembler flowchart)

Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32

SP NOTES

Lecture_no20 Multi-pass assembler with example

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

(Detailed pass2 assembler flowchart)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

1010001000 00 01 10

Halt the program 1111111111111111

Lecture_no14 Meaning of Assembly language program types Literals

bullUSING

bullBALR

bullSTART

bullEND

bullBR

bullLiteral

bullLTORG

bullBCT

bullEQU ExampleSTART EQU 10

The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write

START EQU 10ORG STARTGOTO 10

bullORG

The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored

Example

ORG 0GOTO 10

The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000

But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24

SP NOTES

bullpseudo-ops vs machine-ops

Lecture_no15 Example on Assembly language program

Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc

LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25

SP NOTES

14 NUMBER BLKW 115 SIX FILL x000616 17 END

Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode

The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7

The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary

LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10

BRp AGAIN

If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label

CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader

Pseudo-ops (Assembler Directives)

Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26

SP NOTES

ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory

Lecture_no16 surprise test

Lecture_no17 MODULE-2

AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source

Cross AssemblerThe assembler may run on one machine and produce object code for another

An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader

(Fig Assembler)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27

SP NOTES

For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program

(Fig Program Translation Steps)

Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer

Basic Assembler Functions

Convert mnemonic operation codes to machine language equivalents

Convert symbolic operands to machine addresses (pass 1)

Build machine instructions in the proper format

Convert data constants to internal (machine)representations

Error checking is provided

Write the object program and assembly listing files

Assembler directives

-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions

-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-

bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in

the program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28

SP NOTES

BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant

WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area

Lecture_no18 General design procedure of 360-type Assembler

Syntax Assembly language statements

Assembly language statements are entered one statement per line Each statement follows the following format

[label] mnemonic [operands] [comment]

The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command

An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)

Ex-

START LDAA 24 H

Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29

SP NOTES

on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character

Types of Assembly language statements

It consists of 3 typeseg

(i)Imperative statement

(ii)Declarative statements

(iii)Assembler directive statements

Advantages amp disadvantages of Assembly language

Lecture_no19 Design Specification of 1-pass assembler with example

Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task

Assembler Design can be done in

ndashSingle pass

ndashTwo pass

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30

Source program

Pass1

OPTAB SYMTAB

Intermediate file

Pass2

SYMTAB

Object codes

OPTAB

SP NOTES

The Intermediate file include each source statement assigned address and error indicator

Pass I

Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location

counter Build the symbol table (Table that is used to store each label and its corresponding value)

Pass II

Generate object code(perform actual translation)

One-Pass Assembler

The translation performed by an assembler is essentially a collection of substitutions machine operation code

for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines

bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31

LOCCTR

SP NOTES

(detailed pass1 assembler flowchart)

Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32

SP NOTES

Lecture_no20 Multi-pass assembler with example

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

(Detailed pass2 assembler flowchart)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

bullpseudo-ops vs machine-ops

Lecture_no15 Example on Assembly language program

Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc

LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25

SP NOTES

14 NUMBER BLKW 115 SIX FILL x000616 17 END

Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode

The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7

The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary

LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10

BRp AGAIN

If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label

CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader

Pseudo-ops (Assembler Directives)

Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26

SP NOTES

ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory

Lecture_no16 surprise test

Lecture_no17 MODULE-2

AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source

Cross AssemblerThe assembler may run on one machine and produce object code for another

An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader

(Fig Assembler)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27

SP NOTES

For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program

(Fig Program Translation Steps)

Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer

Basic Assembler Functions

Convert mnemonic operation codes to machine language equivalents

Convert symbolic operands to machine addresses (pass 1)

Build machine instructions in the proper format

Convert data constants to internal (machine)representations

Error checking is provided

Write the object program and assembly listing files

Assembler directives

-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions

-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-

bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in

the program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28

SP NOTES

BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant

WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area

Lecture_no18 General design procedure of 360-type Assembler

Syntax Assembly language statements

Assembly language statements are entered one statement per line Each statement follows the following format

[label] mnemonic [operands] [comment]

The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command

An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)

Ex-

START LDAA 24 H

Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29

SP NOTES

on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character

Types of Assembly language statements

It consists of 3 typeseg

(i)Imperative statement

(ii)Declarative statements

(iii)Assembler directive statements

Advantages amp disadvantages of Assembly language

Lecture_no19 Design Specification of 1-pass assembler with example

Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task

Assembler Design can be done in

ndashSingle pass

ndashTwo pass

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30

Source program

Pass1

OPTAB SYMTAB

Intermediate file

Pass2

SYMTAB

Object codes

OPTAB

SP NOTES

The Intermediate file include each source statement assigned address and error indicator

Pass I

Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location

counter Build the symbol table (Table that is used to store each label and its corresponding value)

Pass II

Generate object code(perform actual translation)

One-Pass Assembler

The translation performed by an assembler is essentially a collection of substitutions machine operation code

for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines

bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31

LOCCTR

SP NOTES

(detailed pass1 assembler flowchart)

Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32

SP NOTES

Lecture_no20 Multi-pass assembler with example

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

(Detailed pass2 assembler flowchart)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

14 NUMBER BLKW 115 SIX FILL x000616 17 END

Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode

The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7

The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary

LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10

BRp AGAIN

If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label

CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader

Pseudo-ops (Assembler Directives)

Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26

SP NOTES

ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory

Lecture_no16 surprise test

Lecture_no17 MODULE-2

AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source

Cross AssemblerThe assembler may run on one machine and produce object code for another

An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader

(Fig Assembler)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27

SP NOTES

For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program

(Fig Program Translation Steps)

Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer

Basic Assembler Functions

Convert mnemonic operation codes to machine language equivalents

Convert symbolic operands to machine addresses (pass 1)

Build machine instructions in the proper format

Convert data constants to internal (machine)representations

Error checking is provided

Write the object program and assembly listing files

Assembler directives

-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions

-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-

bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in

the program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28

SP NOTES

BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant

WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area

Lecture_no18 General design procedure of 360-type Assembler

Syntax Assembly language statements

Assembly language statements are entered one statement per line Each statement follows the following format

[label] mnemonic [operands] [comment]

The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command

An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)

Ex-

START LDAA 24 H

Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29

SP NOTES

on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character

Types of Assembly language statements

It consists of 3 typeseg

(i)Imperative statement

(ii)Declarative statements

(iii)Assembler directive statements

Advantages amp disadvantages of Assembly language

Lecture_no19 Design Specification of 1-pass assembler with example

Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task

Assembler Design can be done in

ndashSingle pass

ndashTwo pass

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30

Source program

Pass1

OPTAB SYMTAB

Intermediate file

Pass2

SYMTAB

Object codes

OPTAB

SP NOTES

The Intermediate file include each source statement assigned address and error indicator

Pass I

Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location

counter Build the symbol table (Table that is used to store each label and its corresponding value)

Pass II

Generate object code(perform actual translation)

One-Pass Assembler

The translation performed by an assembler is essentially a collection of substitutions machine operation code

for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines

bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31

LOCCTR

SP NOTES

(detailed pass1 assembler flowchart)

Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32

SP NOTES

Lecture_no20 Multi-pass assembler with example

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

(Detailed pass2 assembler flowchart)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory

Lecture_no16 surprise test

Lecture_no17 MODULE-2

AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source

Cross AssemblerThe assembler may run on one machine and produce object code for another

An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader

(Fig Assembler)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27

SP NOTES

For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program

(Fig Program Translation Steps)

Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer

Basic Assembler Functions

Convert mnemonic operation codes to machine language equivalents

Convert symbolic operands to machine addresses (pass 1)

Build machine instructions in the proper format

Convert data constants to internal (machine)representations

Error checking is provided

Write the object program and assembly listing files

Assembler directives

-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions

-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-

bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in

the program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28

SP NOTES

BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant

WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area

Lecture_no18 General design procedure of 360-type Assembler

Syntax Assembly language statements

Assembly language statements are entered one statement per line Each statement follows the following format

[label] mnemonic [operands] [comment]

The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command

An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)

Ex-

START LDAA 24 H

Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29

SP NOTES

on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character

Types of Assembly language statements

It consists of 3 typeseg

(i)Imperative statement

(ii)Declarative statements

(iii)Assembler directive statements

Advantages amp disadvantages of Assembly language

Lecture_no19 Design Specification of 1-pass assembler with example

Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task

Assembler Design can be done in

ndashSingle pass

ndashTwo pass

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30

Source program

Pass1

OPTAB SYMTAB

Intermediate file

Pass2

SYMTAB

Object codes

OPTAB

SP NOTES

The Intermediate file include each source statement assigned address and error indicator

Pass I

Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location

counter Build the symbol table (Table that is used to store each label and its corresponding value)

Pass II

Generate object code(perform actual translation)

One-Pass Assembler

The translation performed by an assembler is essentially a collection of substitutions machine operation code

for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines

bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31

LOCCTR

SP NOTES

(detailed pass1 assembler flowchart)

Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32

SP NOTES

Lecture_no20 Multi-pass assembler with example

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

(Detailed pass2 assembler flowchart)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program

(Fig Program Translation Steps)

Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer

Basic Assembler Functions

Convert mnemonic operation codes to machine language equivalents

Convert symbolic operands to machine addresses (pass 1)

Build machine instructions in the proper format

Convert data constants to internal (machine)representations

Error checking is provided

Write the object program and assembly listing files

Assembler directives

-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions

-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-

bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in

the program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28

SP NOTES

BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant

WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area

Lecture_no18 General design procedure of 360-type Assembler

Syntax Assembly language statements

Assembly language statements are entered one statement per line Each statement follows the following format

[label] mnemonic [operands] [comment]

The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command

An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)

Ex-

START LDAA 24 H

Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29

SP NOTES

on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character

Types of Assembly language statements

It consists of 3 typeseg

(i)Imperative statement

(ii)Declarative statements

(iii)Assembler directive statements

Advantages amp disadvantages of Assembly language

Lecture_no19 Design Specification of 1-pass assembler with example

Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task

Assembler Design can be done in

ndashSingle pass

ndashTwo pass

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30

Source program

Pass1

OPTAB SYMTAB

Intermediate file

Pass2

SYMTAB

Object codes

OPTAB

SP NOTES

The Intermediate file include each source statement assigned address and error indicator

Pass I

Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location

counter Build the symbol table (Table that is used to store each label and its corresponding value)

Pass II

Generate object code(perform actual translation)

One-Pass Assembler

The translation performed by an assembler is essentially a collection of substitutions machine operation code

for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines

bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31

LOCCTR

SP NOTES

(detailed pass1 assembler flowchart)

Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32

SP NOTES

Lecture_no20 Multi-pass assembler with example

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

(Detailed pass2 assembler flowchart)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant

WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area

Lecture_no18 General design procedure of 360-type Assembler

Syntax Assembly language statements

Assembly language statements are entered one statement per line Each statement follows the following format

[label] mnemonic [operands] [comment]

The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command

An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)

Ex-

START LDAA 24 H

Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29

SP NOTES

on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character

Types of Assembly language statements

It consists of 3 typeseg

(i)Imperative statement

(ii)Declarative statements

(iii)Assembler directive statements

Advantages amp disadvantages of Assembly language

Lecture_no19 Design Specification of 1-pass assembler with example

Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task

Assembler Design can be done in

ndashSingle pass

ndashTwo pass

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30

Source program

Pass1

OPTAB SYMTAB

Intermediate file

Pass2

SYMTAB

Object codes

OPTAB

SP NOTES

The Intermediate file include each source statement assigned address and error indicator

Pass I

Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location

counter Build the symbol table (Table that is used to store each label and its corresponding value)

Pass II

Generate object code(perform actual translation)

One-Pass Assembler

The translation performed by an assembler is essentially a collection of substitutions machine operation code

for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines

bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31

LOCCTR

SP NOTES

(detailed pass1 assembler flowchart)

Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32

SP NOTES

Lecture_no20 Multi-pass assembler with example

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

(Detailed pass2 assembler flowchart)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character

Types of Assembly language statements

It consists of 3 typeseg

(i)Imperative statement

(ii)Declarative statements

(iii)Assembler directive statements

Advantages amp disadvantages of Assembly language

Lecture_no19 Design Specification of 1-pass assembler with example

Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task

Assembler Design can be done in

ndashSingle pass

ndashTwo pass

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30

Source program

Pass1

OPTAB SYMTAB

Intermediate file

Pass2

SYMTAB

Object codes

OPTAB

SP NOTES

The Intermediate file include each source statement assigned address and error indicator

Pass I

Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location

counter Build the symbol table (Table that is used to store each label and its corresponding value)

Pass II

Generate object code(perform actual translation)

One-Pass Assembler

The translation performed by an assembler is essentially a collection of substitutions machine operation code

for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines

bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31

LOCCTR

SP NOTES

(detailed pass1 assembler flowchart)

Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32

SP NOTES

Lecture_no20 Multi-pass assembler with example

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

(Detailed pass2 assembler flowchart)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

The Intermediate file include each source statement assigned address and error indicator

Pass I

Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location

counter Build the symbol table (Table that is used to store each label and its corresponding value)

Pass II

Generate object code(perform actual translation)

One-Pass Assembler

The translation performed by an assembler is essentially a collection of substitutions machine operation code

for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines

bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31

LOCCTR

SP NOTES

(detailed pass1 assembler flowchart)

Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32

SP NOTES

Lecture_no20 Multi-pass assembler with example

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

(Detailed pass2 assembler flowchart)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

(detailed pass1 assembler flowchart)

Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32

SP NOTES

Lecture_no20 Multi-pass assembler with example

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

(Detailed pass2 assembler flowchart)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Lecture_no20 Multi-pass assembler with example

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

(Detailed pass2 assembler flowchart)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler

The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass

A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on

Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation

Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking

Object Program

Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)

Absolute ProgramThe program must be loaded at the specified (absolute) address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Relocatable Program

The actual starting address is not known until load time The program can be loaded at different address of memory

How to Solvea Modification recordb Base relativec Program counter relative

Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Lecture_no22 Macro language amp Macro processor

MACRO PROCESSOR

The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block

A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros

Obj

Basic Macro Processor Functions

Macro Definition and Usage

In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction

ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36

Source code (with macro)

Macro Processor

Expanded code

Compiler Assembler

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition

modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence

The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion

INCRMT XY

is shown to lead to insertion of the assembly statements

LOAD X

ADD Y

STORE X

in its place All macro calls in a program are expanded in this fashion

Defining a Macro

Let us take another look at the macro definition unit

MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM

INCRMT XY LOAD X macroADD Y expansionSTORE X

ENDM Macro Program

The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written

The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion

Basic Macro Processor Functions

-gtDirectives used during usage of Macro

Macro Indicates begin of Macro MEND indicates end of Macro

-gtPrototype for Macro

Each argument starts with Name and macro Parameter list

MEND

-gtBODY The statement will be generated as the expansion of Macro

Macro Expansion

Each macro invocation statement will be expanded into the statements that form the body of the macro

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)

In the definition of macro parameter In the macro invocation argument

Lecture_no23 Machine-Independent Macro Processor Features

MACRO INVOCATION

A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro

macro_name p1 p2 hellip

Difference between macro call and procedure call

Macro call statements of the macro body are expanded each time the macro is invoked

Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called

Macro parameters

i) Positional parameter

The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters

The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again

INCRMT ampA ampB prototype

INCRMT XY macro call

We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements

LOAD X

ADD Y

STORE X

Example

STRG DATA1 DATA2 DATA3

GENER DIRECT3

(ii) Keyword parameter

Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter

ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3

GENER TYPE=DIRECT CHANNEL=3

Features of macro facility

(1)Macro instruction argument

(2)Macro calls within macro(nested macro calls)

(3)Macro instruction defining(macros within macro)

(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not

Step 1

Scan all macro definitions one by one Foi each macro defined

i) enter its name in the Macro Name Table (MNT)

ii) store the entire macro definition in the Macro Definition Table (MDT)

iii) add auxiliary information to the MNT indicating where the definition of a macro can be

found in MDT

Step 2

Examine all statements in the assembly source program to detect macro calls For each macro call

i) locate the macro in MNT

ii) obtain information from MNT regarding position of the macro definition in MDT

iii) process the macro call statement to establish correspondence between all formal

parameters and their values (ie actual parameters)

iv) expand the macro call by following the procedure given in step 3

Step 3

Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables

Lecture_no25 Design of pass-2 macro processor

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked

Subroutine- the statement in a subroutine appears only once

Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer

The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called

One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND

What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation

o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB

o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

MACRO NAMTAB

CALL DEFTAB

ARGTAB

Two pass algorithm

raquo Pass1 Recognize macro definitions

raquo Pass2 Recognize macro calls

raquo nested macro definitions are not allowed

Write an algorithm amp flowchart for the Macro processor and explain

Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43

PROCESSLINEDEFINE

EXPAND

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution

Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader

The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading

Types of Loader schemes

1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory

2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)

Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Absolute loader As the name itself says it does nothing other than loading

Advantages It is simple to implementDisadvantages

In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Lecture_no27 Machine-Dependent Loader Features ndash Relocation

Program Linking

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory

For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution

Relocation

Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150

If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste

It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program

The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader

Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced

The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs

Lecture_no28 Loader Design Option

Machine independent loader feature

Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader

bullLinkage editor-

Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution

Dynamic Linking-

Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

-gtAllow several executing programs to share one copy of a subroutine or library

bullLinking loader-

A linking loaders performs

raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution

bullOverlay

Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs

Overlaid programs divide the code into a tree of segments

A typical overlay tree ROOT calls A and D A calls B and C D calls E and F

The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E

When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded

(fig Overlay)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50

A D

CB FE

ROOT

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

The essential difference between a linkage editor and a linking loader

Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51

Linking loader

Memory

Library

Object program(s)Object program(s)

Library

Linked program

Linkage editor

Relocating loader

Memory

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses

Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory

-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions

Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader

Subroutine Linkages-

bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error

Design of Direct linking loader

pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols

1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

7END card

pass2

-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)

copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the

address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned

core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp

RLD cardsESD (external symbol dictionary)

Data Structures

Algorithm

Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader

-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion

Binder-bull A binder is a program that performs the same functions as the direct linking loader

ndash allocation relocation and linking bull Outputs the text in a file rather than memory

ndash called a load module

Lecture_no30 Programming Language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

The role of Programming Language

-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines

High-level language(HLL)

A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal

Importance of HLL

Lecture_no 31 Details of features of HLL

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Data types and data structures

Storage allocation and scope of names

Accessing flexibility

Function modularity

Asynchronous operation

Benefits of higher -level language

-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking

Lecture_no32 Table processing

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Sorting and Searching

Objectives

To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort

and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing

1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search

Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array

bull ordered table

bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function

Sorting

We will discuss four comparison-sort algorithms

1 selection sort 2 insertion sort 3 merge sort 4 quick sort

Lecture_no33 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Lecture_no34 MODULE-3

Formal systems and programming language

A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system

Formal systems in mathematics consist of the following elements

1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)

2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not

3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules

Components of a Formal System-

A formal system has the following components

A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model

A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is

generated The rules must take a finite number of steps to apply

Formal Language-

A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects

the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)

the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question

Formal grammar-

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language

Rule of Inference

A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion

Examples of Formal system-

Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc

Grammar-

It is often convenient to specify languages in terms of grammars

Lecture_no35 Formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools

Approaching a formalism-

DEFINITION 1

-gtAn alphabet T is a finite set of symbols(Terminal symbols)

-gtA formula (also called a string or a sentence)is a concatenation of symbols

DEFINITION 2-

A Language L is a subset of the set of finite concatenations of symbols in an alphabet T

We write this Lsub T

Lecture_no36 Development of formal specification

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence

Production Rule-

A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule

Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules

Terminal Symbols-

Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)

Lecture_no37 Formal grammars derivation of sentences

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Formal Definition

bullThe Derivation of sentences-

A grammar is a finite object that can describe an infinite language

DEFINITION

A string γ is immediately derived from a string micro in a grammar G

micro -gt G γ is a single step derivation if and only if

(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β

ϵ P of G

where σ and τ represent arbitrary strings

Sentential forms and sentences

DEFINITION-

A sentential form is any string which can be derived from the starting symbol sum

bullChomsky Hierarchy-

Hierarchy of Grammars

The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey

A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions

a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule

A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)

A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language

Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol

An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar

A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions

a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol

A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3

An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar

Below Figure illustrates the hierarchy of the different types of grammars

Figure Hierarchy of grammars

Lecture_no38 context sensitive vs context free grammar

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar

Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton

Deterministic pushdown automata are somewhat less expressive

Context-sensitive grammars and languages

A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)

A context-sensitive language is a language generated by a context-sensitive grammar

The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied

A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton

Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Context-free grammar Vs Context-sensitive grammar

In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form

V rarr w

where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it

Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language

Lecture_no 39 Backus-Naur Form(BNF)

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory

Introduction to BNF-

A BNF specification is a set of derivation rules written as

ltsymbolgt = __expression__

where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt

The = means that the symbol on the left must be replaced with the expression on the right

Formal Grammar-

In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form

Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Lecture_no 40 Canonic System

Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Lecture_no41 Examples on Canonic system

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Lecture_no 42 Recognition and Translation Algorithm

Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets

This algorithm is capable of recognizing strings produced by Backus-Naur system

In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Lecture_no 43 Compiler

Compiler are basically ldquoLanguage Translatorrdquo

ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language

-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)

Source program---------gt Compiler -------------gt Target program

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Computer Analysis Synthesis Model -

In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an

equivalent target program

-gtAnalysis Partbull The analysis part is carried out in three sub-parts

ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source

string

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

ndash Semantic Analysis bull In this step the meaning of the source string is determined

Properties of Compiler-

bull It must be bug free

bull It must generate correct machine code

bull The generated machine code must run fast

bull The compiler itself must run fast (compilation time must be proportional to program size)

bull The compiler must be portable (ie modular supporting separate compilation)

bull It must give good diagnostics and error messages

bull The generated code must work well with existing debuggers

Phases of a compiler-

ndash Lexical Analysis

bull It is also called scanning

bull It breaks the complete source code into tokens

For example total = count + rate 10

bull Then in lexical analysis phase this statement is broken up into series of tokens as follows

ndash The identifier total

ndash The assignment symbol

ndash The identifier count

ndash The plus sign

ndash The identifier rate

ndash The multiplication sign

ndash The constant number 10

- Syntax Analysis

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

bull It is also called parsing

bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure

bull It determines the structure of the source string by grouping the token together

bull The hierarchical structure generated in this phase is called parse tree or syntax tree

Lecture_no 44 functions of each phase of a compiler(partI)

The Structure of a Compiler

Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink

The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language

The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language

This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable

Multiple Phases

The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code

The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate

Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

the latter constituting the output of the lexical analyzer For example any one of the following

x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not

x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and

A token is a lttoken-nameattribute-valuegt pair For example

1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases

2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =

3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to

ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table

6 The lexeme is mapped to the token ltgt

Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception

Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)

Syntax Analysis (or Parsing)

Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example

x3 = y + 3 would be parsed into the tree on the right

This parsing would result from a grammar containing rules such as

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

asst-stmt rarr id = expr expr rarr number | id | expr + expr

Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right

The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it

(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator

Semantic Analysis

There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary

For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right

Intermediate code generation

Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target

With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce

temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3

We see that three-address code can include instructions with fewer than 3 operands

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Sometimes three-address code is called quadruples because one can view the previous code sequence as

inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form

operation target source1 source2

Code optimization

This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see

1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with

2 add temp2 id2 30

3 The last two quads can be combined into 4 realtoint id1 temp2

Code generation

Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2

LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2

127 Symbol-Table Management

The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location

A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Lecture_no45 Phases of a compiler(part II)

The Grouping of Phases into Passes

Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass

For example one could have the entire front end as one pass

The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking

As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code

Compilers

1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units

called token (token type token value)

2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language

3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space

4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as

possible

5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the

programming language

6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)

7 Ambiguous GrammarThere is more than one possible parse tree for a given statement

Lecture_no 46 Passes of a compiler

Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision

(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler

Compiler Driver calls

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

Syntactic Analyzercalls calls

Contextual Analyzer Code Generator

(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler

Compiler Driver

calls calls calls

Syntactic Analyzer Contextual Analyzer Code GeneratorI

I O I O O

Source Text AST Decorated AST object code

Classifying compilers by number of passes has its background in the hardware resource limitations of computers

Compiling involves performing lots of work and early computers did not have enough memory to contain one program

that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or

some representation of it) performing some of the required analysis and translations

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one

pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be

compiled in a single pass (eg Pascal)

Lecture_no47 Surprise Test

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes

SP NOTES

THANK YOU

Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80

  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • PREREQUISITES
  • One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
  • In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
  • AIM
  • To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
  • OBJECTIVES
  • To understand the relationship between system software and machine architecture
  • To know the design and implementation of Assemblers
  • To know the design and implementation of Linkers and Loaders
  • To have an understanding of macroprocessors
  • To know the function of Compiler
  • EXPECTED OUTCOME
  • This course provides knowledge to design various system programs
  • Assembler
  • Compiler
  • Interpreter
  • One-Pass Assembler
  • The translation performed by an assembler is essentially a collection of substitutions machine operation code
  • for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
  • MACRO PROCESSOR
    • Macro Definition and Usage
      • Example
      • Defining a Macro
      • Linking
      • Relocation
          • High-level language(HLL)
          • Sorting and Searching
            • Objectives
              • Sorting
                • Context-free grammars and languages
                • Context-sensitive grammars and languages
                • Context-free grammar Vs Context-sensitive grammar
                  • Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
                    • ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
                    • string
                    • ndash Semantic Analysis bull In this step the meaning of the source string is determined
                        • The Structure of a Compiler
                          • Multiple Phases
                          • Lexical Analysis (or Scanning)
                          • Syntax Analysis (or Parsing)
                          • Semantic Analysis
                          • Intermediate code generation
                          • Code optimization
                          • Code generation
                          • 127 Symbol-Table Management
                          • The Grouping of Phases into Passes